From Zero to Airflow: bootstrapping a ML platform
2020-07-15 , Melbourne Meetup, [Sessions start: Thursday 16.07 2pm (Wednesday 15.07, 9 pm PDT)]

At Bluevine we use Airflow to drive our ML platform. In this talk, I'll present the challenges and gains we had at transitioning from a single server running python scripts with cron to a full blown Airflow setup. This includes: supporting multiple Python versions, event driven DAGs, performance issues and more!


In Bluevine, we were looking to upgrade our data processing infrastructure from a single server running Python scripts with Cron to a more scalable solution that allows for workflows (DAGs) and better observability of the application state. Airflow proved to be a valuable tool, though not without some sharp edges. Some of the points that I'll cover are:

  • Supporting multiple Python versions
  • Event driven DAGs
  • Airflow Performance issues and how we circumvented them
  • Building Airflow plugins to enhance observability
  • Monitoring Airflow using Grafana
  • CI for Airflow DAGs (super useful!)
  • Patching Airflow scheduler

I live in Tel-Aviv with my wife, baby-daughter and dog. I used to teach Python classes, do freelance work and organize Python community gatherings (PywebIL and Pycon Israel). These days I'm working as a data engineering team lead a Bluevine.