Testing Airflow workflows - ensuring your DAGs work before going into production
2020-07-15 , Amsterdam Meetup, [Sessions start: Wednesday 15.07 6pm (Wednesday 15.07 9 am PDT)]

How do you ensure your workflows work before deploying to production? In this talk I'll go over various ways to assure your code works as intended - both on a task and a DAG level. In this talk I will cover:

  • How to test and debug tasks locally
  • How to test with and without task instance context
  • How to test against external systems, e.g. how to test a PostgresOperator?
  • How to test the integration of multiple tasks to ensure they work nicely together

Are you deploying code to your Airflow instance, only to clear tasks or trigger a DAG, and waiting for it to succeed? Are you sick of going through an entire CI/CD cycle only to check if your task works correctly? In this talk I will demonstrate various methods for testing your workflows locally, to ensure your code works as intended and to speed up development.

I will cover various use cases for testing, starting easy with testing individual tasks with Pytest, ending with various ways for integration testing complete DAGs. Covered topics include:
- Testing operators with and without task instance context
- Debugging Airflow code
- Various tools for testing (Pytest & useful plugins, Docker, and more)
- Mocking calls to external systems
- Testing against real external systems in tests, e.g. a Postgres database
- Where the Airflow CLI can help validating tasks & DAGs
- How to validate the integration of all tasks in your DAG

Data engineer at GoDataDriven, Airflow trainer and committer, and co-author of the (currently in progress) Manning book Data Pipelines with Apache Airflow.

In his daily job he helps companies become more data driven by building data solutions, and wants to combine cool data products with scalable and solid software. In the past years he worked at various companies such as Booking, ING and Unilever.