ditto.ditto

class ditto.ditto.AirflowDagTransformer(target_dag, transformer_defaults=None, transformer_resolvers=None, subdag_transformers=None, debug_mode=False)[source]

Bases: object

Ditto’s core logic is executed by this class. It executes two operations on a DAG in sequence. First it will run the provided subdag transformers to transform the source DAG itself Then it will transform the operators in the source dag and create a target DAG out of the returned DAGFragment’s.

Parameters
  • target_dag (DAG) – ditto allows you to provide a pre-fabricated airflow DAG object so that you can set essential parameters like it’s schedule_interval, params, give it a unique dag_id, etc. outside of ditto itself, instead of ditto copying the attributes of the DAG over from the source DAG. This gives more flexbility.

  • transformer_defaults (Optional[TransformerDefaultsConf]) – allows you to pass a map of transformer type to their default configuration. This is helpful to pass things like a default operator to use when the transformer cannot transform the source operator for some reason, or any other configuration required by the transformer

  • transformer_resolvers (Optional[List[TransformerResolver]]) – resolvers to use to find the transformers for each kind of operator in the source DAG.

  • subdag_transformers (Optional[List[Type[SubDagTransformer]]]) – subdag transformers to use for converting matching subdags in the source DAG to transformed subdags

  • debug_mode (bool) – when True it will render the intermediate results of transformation using networkx and maplotlib so that you can debug your transformations easily.

transform_operators(src_dag)[source]

Transforms the operators in the source DAG and creates the target DAG out of the returned DAGFragment’s. Finds the transformers by running each operator through all the resolvers passed.

Does a bread-first-traversal on the source DAG such that the result of the transformation of upstream (and previous ops in this level) are available to downstream transformers in level-order. This is helpful for real world use cases of transformation like having a spark step op transformer read the result of the transformation of a cluster create op transformer.

Caches the results of transformations to avoid repeat work, as this is a graph, not a tree being traversed.

Note

Stitches the final target DAG after having transformed all operators.

Parameters

src_dag (DAG) – the source airflow DAG to be operator-transformed

Returns

does not return anything. mutates self.target_dag directly

transform_sub_dags(src_dag)[source]

Transforms the subdags of the source DAG, as matched by the TaskMatcher DAG provided by the SubDagTransformer’s get_sub_dag_matcher() method.

Multiple subdag transformers can run through the source DAG, and each of them can match+transform multiple subdags of the source DAG, _and_ each such transformation can return multiple subdags as a result, so this can get quite flexible if you want.

The final DAG is carefully stitched with all the results of the subdag transformations.

See the unit tests at test_core.py for complex examples.

Note

If your matched input subdag had different leaves pointing to different operators/nodes, the transformed subdags leaves will just get multiplexed to all the leaves of the source DAG, since it is not possible to know which new leaf is to be stitched to which node of the source DAG, and resolve new relationships based on old ones.

Warning

Make sure that you don’t provide SubDagTransformer’s which with overlapping subdag matchers, otherwise things can understandably get messy.

See also

The core logic behind this method lies in a graph algorithm called subgraph isomorphism, and is explained in detail at find_sub_dag()

Parameters

src_dag (DAG) – the source airflow DAG to be subdag-transformed

Returns

does not return anything. mutates the passed src_dag directly, which is why you should pass a copy of the source DAG.

transform(src_dag)[source]

This is the entry point to using ditto, and is the only method you should have to call after creating the transformer object. This calls the transform_sub_dags() and transform_operators() methods in sequence to realize the final target_dag and return it

Parameters

src_dag (DAG) – the source airflow DAG to be ditto-transformed

Returns

the resultant transformed DAG