Effective Cross-DAG Dependency
2020-07-16 , Bay Area Meetup [Sessions start: Thursday 16.07 9am (Thursday 16..07 9am PDT)]

Cross-DAG dependency may reduce cohesion in data pipelines and, without having an explicit solution in Airflow or in a third-party plugin, those pipelines tend to become complex to handle. That is the reason we, at QuintoAndar, have created an intermediate DAG to handle relationships across data pipelines called Mediator, in order for them to be scalable and maintainable by any team.


At QuintoAndar we seek automation and modularization in our data pipelines and believe that breaking them into many responsibility modules, a.k.a. DAGs, enhances maintainability, reusability and understanding to move data from one point to another. However, extending interconnections between DAGs tend to reduce those enhancements, make them complex and, above all, there's no explicit built-in solution in Airflow for them. That is why we created a Mediator DAG.

The Mediator DAG in Airflow has the responsibility of looking for successfully finished DAG executions that may represent the previous step of another. That is, if a DAG is dependent of another, the Mediator will take care of checking and triggering the necessary objects for the data flow to continue.

In conclusion, it is sometimes not practical to combine multiple DAGs into one. Hence, our proposal, is to define a Mediator DAG to handle dependencies and bring cohesion to a data pipeline without losing its purpose.

Rafael Ribaldo is a Data Engineering Manager at QuintoAndar, where he leads a high-performance team towards building a data platform that can scale, has high performance and, most of all, is out of this world.

Before starting a career in Data Engineering, Ribaldo spent a couple of years as a Software Engineer writing Java code for companies in the payment and beverage businesses. After discovering a passion for Big Data projects, Ribaldo now helps QuintoAndar to continue to be a data driven company by making data available in a whole new level.

Ribaldo's available for discussing data architectures and the main goals to achieve data drivenness. You can reach him at rafael.ribaldo@quintoandar.com.br.

I am a Data Engineer at QuintoAndar.
I firstly graduated in System Analysis and Development and then studied Big Data and Data Management Intelligence at Polytechnic School of the University of Sao Paulo. I started working as a Data Engineer at QuintoAndar, where I knew Apache Airflow and its vibrant community.
Our data team daily works towards making our data platform better to continue supporting QuintoAndar as a data-driven company.
I am passionate about learning and sharing ideas, so I will be glad to talk about data architecture or any Airflow related topics.