Airflow CI/CD: Github to Cloud Composer (safely)
2020-07-09 , Seattle Meetup [Session starts: Thursday 09.07 9am (Thursday 09.07 9am PDT)]

Deploying bad DAGs to your airflow environment can wreak havoc. This talk provides an opinionated take on a mono repo structure for GCP data pipelines leveraging BigQuery, Dataflow and a series of CI tests for validating your Airflow DAGs before deploying them to Cloud Composer.


Composer makes deploying airflow infrastructure easy and deploying DAGs “just dropping files in a GCS bucket”. However, this opens the opportunity for many organizations to shoot themselves in the foot by not following a strong CI/CD process. Pushing bad dags to Composer can manifest in a really sad airflow webserver and many wasted DAG parsing cycles in the scheduler, disrupting other teams using the same environment. This talk will outline a series of recommended continuous integration tests to validate PRs for updating or deploying new Airflow DAGs before pushing them to your GCP Environment with a small “DAGs deployer” application that will manage deploying DAGs following some best practices. This talk will walk through explaining automating these tests with Cloud Build, but could easily be ported to your favorite CI/CD tool.

See also: draft slides (1.8 MB)

Jake fell in love with open source process thanks to the inclusiveness and helpfulness of the Airflow Community. He's blessed to be a part of the Google Cloud Professional Services family which enables him in making GCP easier to use by building OSS tooling to help our customers.