ditto.ditto¶
-
class
ditto.ditto.
AirflowDagTransformer
(target_dag, transformer_defaults=None, transformer_resolvers=None, subdag_transformers=None, debug_mode=False)[source]¶ Bases:
object
Ditto’s core logic is executed by this class. It executes two operations on a DAG in sequence. First it will run the provided subdag transformers to transform the source DAG itself Then it will transform the operators in the source dag and create a target DAG out of the returned
DAGFragment
’s.- Parameters
target_dag¶ (
DAG
) – ditto allows you to provide a pre-fabricated airflow DAG object so that you can set essential parameters like it’sschedule_interval
,params
, give it a uniquedag_id
, etc. outside of ditto itself, instead of ditto copying the attributes of the DAG over from the source DAG. This gives more flexbility.transformer_defaults¶ (
Optional
[TransformerDefaultsConf
]) – allows you to pass a map of transformer type to their default configuration. This is helpful to pass things like a default operator to use when the transformer cannot transform the source operator for some reason, or any other configuration required by the transformertransformer_resolvers¶ (
Optional
[List
[TransformerResolver
]]) – resolvers to use to find the transformers for each kind of operator in the source DAG.subdag_transformers¶ (
Optional
[List
[Type
[SubDagTransformer
]]]) – subdag transformers to use for converting matching subdags in the source DAG to transformed subdagsdebug_mode¶ (
bool
) – when True it will render the intermediate results of transformation using networkx and maplotlib so that you can debug your transformations easily.
-
transform_operators
(src_dag)[source]¶ Transforms the operators in the source DAG and creates the target DAG out of the returned
DAGFragment
’s. Finds the transformers by running each operator through all the resolvers passed.Does a bread-first-traversal on the source DAG such that the result of the transformation of upstream (and previous ops in this level) are available to downstream transformers in level-order. This is helpful for real world use cases of transformation like having a spark step op transformer read the result of the transformation of a cluster create op transformer.
Caches the results of transformations to avoid repeat work, as this is a graph, not a tree being traversed.
Note
Stitches the final target DAG after having transformed all operators.
-
transform_sub_dags
(src_dag)[source]¶ Transforms the subdags of the source DAG, as matched by the
TaskMatcher
DAG provided by theSubDagTransformer
’sget_sub_dag_matcher()
method.Multiple subdag transformers can run through the source DAG, and each of them can match+transform multiple subdags of the source DAG, _and_ each such transformation can return multiple subdags as a result, so this can get quite flexible if you want.
The final DAG is carefully stitched with all the results of the subdag transformations.
See the unit tests at test_core.py for complex examples.
Note
If your matched input subdag had different leaves pointing to different operators/nodes, the transformed subdags leaves will just get multiplexed to all the leaves of the source DAG, since it is not possible to know which new leaf is to be stitched to which node of the source DAG, and resolve new relationships based on old ones.
Warning
Make sure that you don’t provide
SubDagTransformer
’s which with overlapping subdag matchers, otherwise things can understandably get messy.See also
The core logic behind this method lies in a graph algorithm called subgraph isomorphism, and is explained in detail at
find_sub_dag()
-
transform
(src_dag)[source]¶ This is the entry point to using ditto, and is the only method you should have to call after creating the transformer object. This calls the
transform_sub_dags()
andtransform_operators()
methods in sequence to realize the finaltarget_dag
and return it