PyCon Sweden 2021

Airflow 2.0 for ML pipelines – design, implementation and management
2021-10-21 , Workshops

Live Stream: https://youtu.be/qWvJSIgOcPU

With a lot of changes under the hood with Airflow 2.0, the workshop aims to give an overview on major updates in Airflow 2.0 from 1.0, major components and working of Airflow and hands-on demo of implementation and management of an end-to-end Machine Learning pipeline. Without a pipeline in-place, management of multiple Machine Learning stages in production can be difficult. This gives an overview of simplified process and management of Python based ML projects using Airflow.


Prerequisites

  1. Install Docker Desktop (with minimum 3GB memory allocated)
  2. Start Docker engine
  3. Clone the workshop repo with git clone https://github.com/pycon-ml/airflow_workshop.git
  4. Run docker-compose pull inside repo folder airflow_workshop

Agenda

  • 05 min: Introduction

  • 05 min: Major changes in Airflow 2.0

  • 05 min: Pre-requisites setup overview

  • 10 min: Walkthrough of different backend components

  • 10 min: Different stages of a DAG file – steps and operators

  • 10 min: Dynamic DAG creation to improve parallelism

  • 15 min: How to trigger Airflow DAG runs

  • 15 min: Debug and clear Airflow task errors

  • 10 min: Overview of production-level Airflow-based architecture

  • 05 min: Wrap up questions

Alen Jacob is a Machine Learning Engineer at H&M and have a Masters' Degree in Computational Linguistics.

Scott Zhou is a competence lead for Machine Learning Engineers at H&M and a Machine Learning Engineer himself.

Lini Jose is a Machine Learning Engineer at H&M.

Nitin Bisht is a Software Engineer at H&M.