PyCon AU 2025

Patrick Sunter

Patrick Sunter works as a scientific software engineer at the Australian Bureau of Meteorology.
His career has been at the intersection of software development and scientific R&D for over two decades, in diverse fields including earth science, transportation analysis and now hydrology and meteorology.
Now living in regional Victoria outside Ballarat, his hobbies include playing real/royal tennis, the world's oldest racquet sport.

What pronouns do you use?:

He/him


Session

09-12
10:00
30min
Going with the flow? Apache Airflow for operational-quality scientific workflows
Patrick Sunter, Michael Pegios, Daehyok Shin

In this session we will share our experience of using Apache Airflow to build production scientific modelling workflows. This will draw on our work at the Australian Bureau of Meteorology on multiple projects which updated existing services to use Airflow – the eReefs water quantity and quality modelling and Seasonal Streamflow Forecasting services.

Why invest effort to learn and then apply the Airflow framework to manage your scientific workflows? In our years of experience, building workflows around apps for scientific analysis that are both operational-quality, and are also enjoyable and productive to work with for scientific developers, has been something of a persistent pain-point.

As scientific developers, if you roll your own workflow management system from the ground up then you retain control and can use all your favourite Python tools - but over time it can often result in a combination of scripts, cron and/or Jenkins jobs that is hard to maintain. You’ll also be short of features you need in an operational-quality system like good logging, error handling, and a pleasant monitoring web UI for non-developers (e.g. application support teams) to use. All the above is exacerbated when effective task parallelisation is a goal. On the other hand, applying off-the-shelf general business IT workflow management apps to scientific modelling use-cases can result in cumbersome systems that are difficult to update and involve a lot of duplication.

Enter Apache Airflow - an open-source workflow manager written in Python with workflow Directed Acyclic Graphs (DAGs) defined directly in Python code. We’ll give examples illustrated from our project work of updating existing systems to run in an Airflow framework with a goal to enable greater automation, scalability and quality control. These include:

  • Challenges faced getting started with Airflow for our small project teams – including tips for setting up development instances of Airflow’s scheduler and workflow backends.
  • Summaries of how we used the different Airflow “Operators” to invoke program code – including trade-offs between tight and loose coupling, and how this interacts with the use of Conda for managing complex scientific software stacks.
  • Our experience of using Airflow’s workflow parallelisation effectively for chunking up work.
  • Experience from different deployment options – both to AWS cloud containers, and locally-managed Virtual Machines.

We'll finish with reflecting on key lessons learnt, and ideas for further improvement in scientific software workflow management.

Scientific Python
Ballroom 2