2019-09-02, 14:00–15:30, Track4 (Chillida)
In this workshop, you will learn how to migrate from ‘scripts soups’ (a set of scripts that should be run in a particular order) to robust, reproducible and easy-to-schedule data pipelines in Airflow.
Introduction (5 minutes)
Go over the agenda
List the relevant resources
Make sure everyone has followed the installation instructions
Intro to data pipelines
Go over the components of traditional data science pipelines
Presentation of the scripts soup anttipatern
Creating a script soup
The attendees will perform an ETL task on some data using a set of independent scripts.
In this exercise, I will provide and explain the code and explain what we are trying to achieve with this pseudo-pipeline. The attendees will have a chance to try and reproduce it themselves.
Introduction to Airflow and DAGS
Introduce the concept of DAGs (directed acyclic graphs)
Present and introduce the components of Airflow
Set up a local instance of Airflow
The attendees will create a local instance of Airflow and explore the sample DAGS provided.
They will be introduced to the scheduling capabilities of the tool and track the status of the pipelines using the web GUI.
ETL task on Airflow
I will provide hints on how to transform the scripts soup into Airflow DAGS.
For this, I will use the pseudo code and other pedagogical approaches inspired by the software carpentry lessons to direct the attendees to the deployment of their first DAG in Airflow.
Wrap up and questions