Apache Airflow Summit Online Edition 2020 :: pretalx

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.

Monday, July 6, 2020

Tuesday, July 7, 2020

Wednesday, July 8, 2020

Thursday, July 9, 2020

Friday, July 10, 2020

Saturday, July 11, 2020

Sunday, July 12, 2020

Monday, July 13, 2020

Tuesday, July 14, 2020

Wednesday, July 15, 2020

Thursday, July 16, 2020

Friday, July 17, 2020

09:00

Airflow then and Now (Keynote 1)

Bolke de Bruin, Maxime Beauchemin

Airflow then and Now

London Meetup, [Session starts: Monday 06.07 5pm (Monday 06.07 9am PDT)]

10:00

Airflow at Société Générale : An open source orchestration solution in a banking environment

Mohammed MARRAGH, Alaeddine Maaoui

This talk will cover a presentation of the tool as well as feedback from the implementation of Airflow in a banking production environment which is Société Générale. It will be the summary of a two-year experience, a storytelling of an adventure within Société Générale in order to offer an internal cloud solution based on Airflow (AirflowaaS).

London Meetup, [Session starts: Monday 06.07 5pm (Monday 06.07 9am PDT)]

11:00

Scheduler as a service - Apache Airflow at EA Digital Platform

Nitish Victor, Xiaoqin Zhu, Preethi Ganeshan

In this talk, we share the lessons learnt while building a scheduler-as-a-service leveraging Apache Airflow to achieve improved stability and security for one of the largest gaming companies. The platform integrates with different data sources and meets varied SLA’s across workflows owned by multiple game studios. In particular, we present a comprehensive self-serve airflow architecture with multi-tenancy, auto-dag generation, SSO-integration with improved ease of deployment.

London Meetup, [Session starts: Monday 06.07 5pm (Monday 06.07 9am PDT)]

09:00

How AirBnB/Twitter/Lyft use Airflow - Keynote 4

Kevin Yang, Dan Davydov, Tao Feng

To be detailed

NYC Meetup [Session starts: Tuesday 07.07 12pm (Tuesday 07.07 9am PDT)]

10:00

Data DAGs with lineage for fun and for profit

Let’s be honest about it. Many of us don’t consider data lineage to be cool. But what if lineage would allow you to write less boilerplate and less code, while at the same time make your data scientists, your auditors, your management and well everyone more happy? What if you could write DAGs that mix between tasks based and data based?

NYC Meetup [Session starts: Tuesday 07.07 12pm (Tuesday 07.07 9am PDT)]

10:15

10:15

30min

Bangalore Meetup [Session starts Wednesday 8.07 9.30 am (Tuesday 07.07. 9pm PDT) ]

11:00

Airflow on Kubernetes: Containerizing your Workflows

I have been one of the engineers at Nieslen Digital leading our migration of ETLs to Airflow on Kubernetes. This talk will teach you the ins and outs of Airflow on Kubernetes, from deploying Airflow to best practices for DAG development in a containerized environment. Airflow on Kubernetes will ease your Airflow DAG development, minimize its infrastructure costs, avoid wasted resources, and providing tasks with the optimal infrastructure to run on all through Kubernetes features within Airflow.

NYC Meetup [Session starts: Tuesday 07.07 12pm (Tuesday 07.07 9am PDT)]

21:00

Data flow with Airflow @ PayPal

Aishwarya Sankaravadivel

With data becoming the new oil, Data Reliability Engineering(DRE) has been buzzing all around in all leading Fin Tech industries. And in this space, working on big data/data science necessarily deals with building pipelines starting from data exploration to enrichment at scheduled intervals. While simple scripts are easy to create it gets cumbersome to manage when we want to build a resilient data pipe with intelligent failure handling for unexpected long running/short lived dependent tasks. Apa

Bangalore Meetup [Session starts Wednesday 8.07 9.30 am (Tuesday 07.07. 9pm PDT) ]

22:00

Democratised Data Workflows at scale

Mihail Petkov, Emil Todorov

Financial Times is increasing its digital revenue by allowing business people to make data-driven decisions. Providing an Airflow based platform where data engineers, data scientists, BI experts and others can run language agnostic jobs was a huge swing. One of the most successful steps in the platform’s development was building execution environment, allowing stakeholders to self deploy jobs without cross team dependencies on top of the unlimited scale of Kubernetes.

Bangalore Meetup [Session starts Wednesday 8.07 9.30 am (Tuesday 07.07. 9pm PDT) ]

23:00

Migrating Airflow-based Spark jobs to Kubernetes - the native way

Roi Teveth, Itai Yaffe

At Nielsen Identity Engine, we use Spark to process 10’s of TBs of data. Our ETLs, orchestrated by Airflow, spin-up AWS EMR clusters with thousands of nodes per day. In this talk, we’ll guide you through migrating Spark workloads to Kubernetes with minimal changes to Airflow DAGs, using the open-sourced GCP Spark-on-K8s operator and the native integration we recently contributed to the Airflow project.

Bangalore Meetup [Session starts Wednesday 8.07 9.30 am (Tuesday 07.07. 9pm PDT) ]

09:00

Future of Airflow (KeyNote 2)

Jarek Potiuk, Kaxil Naik, Ash Berlin-Taylor, Daniel Imberman, Tomek Urbaszek, Kamil Bregula (@mik-laj)

What's new in Airflow 2.0

Bay Area Meetup [Session starts: Wednesday 08.07 9 am (Wednesday 08.07 9am PDT)]

10:00

Talk by Google - Platinum Sponsor of the Airflow Summit

Aizhamal Nurmamat kyzy

TODO

Bay Area Meetup [Session starts: Wednesday 08.07 9 am (Wednesday 08.07 9am PDT)]

11:00

Airflow as the next Gen of workflow system at Pinterest

Dinghang Yu, Yulei Li, Ace Haidrey

To share the experience of adopting Airflow as the next generation of workflow system at pinterest, distributing the worker loads k8s, load sharding/partitioning via multiple schedulers, transparent existing user flow migration and more .

Bay Area Meetup [Session starts: Wednesday 08.07 9 am (Wednesday 08.07 9am PDT)]

09:05

Diversity - Making Airflow more welcoming community (Keynote 3)

Aizhamal Nurmamat kyzy, Griselda Cuevas

To be detailed

Seattle Meetup [Session starts: Thursday 09.07 9am (Thursday 09.07 9am PDT)]

10:00

Improving Airflow’s user experience

Astronomer is focused on improving Airflow’s user experience through the entire lifecycle — from authoring + testing DAGs, to building containers and deploying the DAGs, to running and monitoring both the DAGs and the infrastructure that they are operating within — with an eye towards increased security and governance as well.

Seattle Meetup [Session starts: Thursday 09.07 9am (Thursday 09.07 9am PDT)]

11:00

Airflow CI/CD: Github to Cloud Composer (safely)

Deploying bad DAGs to your airflow environment can wreak havoc. This talk provides an opinionated take on a mono repo structure for GCP data pipelines leveraging BigQuery, Dataflow and a series of CI tests for validating your Airflow DAGs before deploying them to Cloud Composer.

Seattle Meetup [Session starts: Thursday 09.07 9am (Thursday 09.07 9am PDT)]

09:00

Demo: Reducing the lines, a visual DAG editor

One of the significant challenges in scaling Airflow at an organization is the number of qualified developers fluent in Python. To speed the development of complex pipelines we developed a DAG authoring and editing tool for Airflow. Installed as a plugin, this tool allows users to author DAGs compose existing operators and hooks with virtually no Python experience.

A live demo of the tool and accompanying code.

Warsaw Meetup [Session starts: Friday 10.07 6pm (Friday 10.07 9am PDT)]

10:00

AIP-31: Airflow functional DAG definition

Gerard Casas Saez

Airflow does not currently have an explicit way to declare messages passed between tasks in a DAG. XCom are available but are hidden in execution functions inside the operator. AIP-31 proposes a way to make this message passing explicit in the DAG file and make it easier to reason about your DAG behaviour.

Warsaw Meetup [Session starts: Friday 10.07 6pm (Friday 10.07 9am PDT)]

11:00

Using Airflow to speed up development of data intensive tools

In this talk we will review how Airflow helped create a tool to detect data anomalies. Leveraging Airlfow for process management, database interoperability, and authentication created an easy path forward to achieve scale, decrease the development time and pass security audits. While Airflow is generally looked at as a solution to manage data pipelines, integrating tools with Airlfow can also speed up development of those tools.

Warsaw Meetup [Session starts: Friday 10.07 6pm (Friday 10.07 9am PDT)]

No sessions on Saturday, July 11, 2020.

No sessions on Sunday, July 12, 2020.

09:00

Autonomous Driving with Airflow

Amr Noureldin, Michał Dura

This talk describes how Airflow is utilized in an Autonomous driving project, originating from Munich - Germany. We describe the Airflow setup, what challenges we encountered and how we maneuvered to achieve a distributed and highly scalable Airflow setup.

London Meetup, [Sessions start: Monday 13.07 5pm (Monday 13.07 9am PDT)]

10:00

From cron to Airflow on Kubernetes: A Startup Story

Learn how Devoted Health went from cron jobs to to Airflow deployment Kubernetes using a combination of open source and internal tooling.

London Meetup, [Sessions start: Monday 13.07 5pm (Monday 13.07 9am PDT)]

11:00

Airflow In Airbnb

Kevin Yang, Conor Camp, Cong Zhu, Ping Zhang, Yingbo Wang

Go over the yesterday, today and tomorrow for Airflow in Airbnb. Share our learnings and vision in Airflow core and around Airflow in its eco system.

London Meetup, [Sessions start: Monday 13.07 5pm (Monday 13.07 9am PDT)]

21:00

Pipelines on Pipelines: Agile CI/CD Workflows for Airflow DAGs

How do you create fast and painless delivery of new DAGs into production? When running Airflow at scale, it becomes a big challenge to manage the full lifecycle around your pipelines; making sure that DAGs are easy to develop, test, and ship into prod. In this talk, we will cover our suggested approach to building a proper CI/CD cycle that ensures the quality and fast delivery of production pipelines.

Tokyo Meetup, [Sessions start: Tuesday 14.07 1 pm (Monday 13.07 9 pm PDT)]

22:00

Production Docker Image for Apache Airflow

This talk will guide you trough internals of the official Production Docker Image of Airflow. It will show you the foreseen use cases for it and how to use it in conjunction with the Official Helm Chart to make your own deployments.

Tokyo Meetup, [Sessions start: Tuesday 14.07 1 pm (Monday 13.07 9 pm PDT)]

23:00

Airflow as an Elastic ETL Tool

Hendrik Kleine, Vicente Ruben Del Pino Ruiz

In search of a better, modern, simplistic method of managing ETL's processes and merging them with various AI and ML tasks, we landed on Airflow. We envisioned a new user friendly interface that can leverage dynamic DAG's and reusable components to build an ETL tool that requires virtually no training.

Tokyo Meetup, [Sessions start: Tuesday 14.07 1 pm (Monday 13.07 9 pm PDT)]

09:00

How do we reason about the reliability of our data pipeline in Wrike?

Aleksandr Eliseev

We’re using airflow for almost two years now and scaled it from two users to 8 teams.

We would like to share our story, how we reason about the reliability of our data pipelines.

We will tell:
How are we establishing a reliable review process on AirFlow?
How we’re using multiple-airflow configuration to migrate from our DC to cloud and to reuse the production data in acceptance wherever possible.
How do we use data versioning to make sure that data is up-to-date throughout the pipeline?

NYC Meetup,[Sessions start: Tuesday 14.07 12pm (Tuesday 14.07 9am PDT)]

10:00

Airflow Observability using Databand

Databand is a data engineering-focused observability and monitoring solution. We built the solution for modern data teams that need to guarantee the reliability and health of their data products.

NYC Meetup,[Sessions start: Tuesday 14.07 12pm (Tuesday 14.07 9am PDT)]

11:00

From S3 to BigQuery - How A First-Time Airflow User Successfully Implemented a Data Pipeline

BigQuery is GCP's serverless, highly scalable and cost-effective cloud data warehouse that can analyze petabytes of data at super fast speeds. Amazon S3 is one of the oldest and most popular cloud storage offerings. Folks with data in S3 often want to use BigQuery to gain insights into their data. Using Apache Airflow, they can build pipelines to seamlessly orchestrate that connection. In this talk, Emily and Leah will walk through how they created an easily configurable pipeline to extract data

NYC Meetup,[Sessions start: Tuesday 14.07 12pm (Tuesday 14.07 9am PDT)]

09:00

Building Reuseable and Trustworthy ELT pipelines (A templated approach)

To improve automation of data pipelines, I propose a universal approach to ELT pipeline that optimizes for data integrity, extensibility, and speed to delivery. The workflow is built using open source tools and standards like Apache Airflow, Singer, Great Expectations, and DBT.

Amsterdam Meetup, [Sessions start: Wednesday 15.07 6pm (Wednesday 15.07 9 am PDT)]

10:00

Testing Airflow workflows - ensuring your DAGs work before going into production

How do you ensure your workflows work before deploying to production? In this talk I'll go over various ways to assure your code works as intended - both on a task and a DAG level. In this talk I will cover:

How to test and debug tasks locally
How to test with and without task instance context
How to test against external systems, e.g. how to test a PostgresOperator?
How to test the integration of multiple tasks to ensure they work nicely together

Amsterdam Meetup, [Sessions start: Wednesday 15.07 6pm (Wednesday 15.07 9 am PDT)]

11:00

Adding An Executor to Airflow: A Contributor Overflow Exception

Engaging with a new community is a common experience in OSS development.
There are usually expectations held by the project about the contributor's exposure
to the community, and by the contributor about interactions with the community.
When these expectations are misaligned, the process is strained. In this talk,
I'll discuss a real life experience that required communication,
persistence, and patience to ultimately lead to a positive outcome.

Amsterdam Meetup, [Sessions start: Wednesday 15.07 6pm (Wednesday 15.07 9 am PDT)]

21:00

Migration to Airflow Backport Providers

In this talk I will showcase how to use the newly released Airflow Backport Providers.

Some of the topics we will cover are:

How to install them in Airflow 1.10.x
How to install them in Composer
How to migrate one or more DAG from using legacy to new providers.
Known bugs and fixes.

Melbourne Meetup, [Sessions start: Thursday 16.07 2pm (Wednesday 15.07, 9 pm PDT)]

22:00

From Zero to Airflow: bootstrapping a ML platform

At Bluevine we use Airflow to drive our ML platform. In this talk, I'll present the challenges and gains we had at transitioning from a single server running python scripts with cron to a full blown Airflow setup. This includes: supporting multiple Python versions, event driven DAGs, performance issues and more!

Melbourne Meetup, [Sessions start: Thursday 16.07 2pm (Wednesday 15.07, 9 pm PDT)]

23:00

Airflow the perfect match in our Analytics Pipeline

Working with Airflow is no breeze. For three years we at LOVOO, a market-leading dating app, have been using the Google Cloud managed version of Airflow, a product we’ve been familiar with since its Alpha release. We took a calculated risk and integrated the Alpha into our product, and, luckily, it was a match. Since then, we have been leveraging this software to build out not only our data pipeline, but also boost the way we do analytics and BI.

Melbourne Meetup, [Sessions start: Thursday 16.07 2pm (Wednesday 15.07, 9 pm PDT)]

09:00

Data Engineering Hierarchy of Needs

Data Infrastructures look differently between small, mid, and large sized companies. Yet, most content out there is for large and sophisticated systems. And almost none of it is on migrating a legacy, on-prem, databases over to the cloud.

We'll begin with the fundamentals of building a modern Data Infrastructure from the ground up through a hierarchy of needs. The hierarchy has a (subjective) 7 levels, ranging from Automation to Data Streaming.

Bay Area Meetup [Sessions start: Thursday 16.07 9am (Thursday 16..07 9am PDT)]

10:00

Talk by Polidea - Gold Sponsor of the Airflow Summit

TODO

Bay Area Meetup [Sessions start: Thursday 16.07 9am (Thursday 16..07 9am PDT)]

11:00

Effective Cross-DAG Dependency

Rafael Ribaldo, Lucas Mendes Mota da Fonseca

Cross-DAG dependency may reduce cohesion in data pipelines and, without having an explicit solution in Airflow or in a third-party plugin, those pipelines tend to become complex to handle. That is the reason we, at QuintoAndar, have created an intermediate DAG to handle relationships across data pipelines called Mediator, in order for them to be scalable and maintainable by any team.

Bay Area Meetup [Sessions start: Thursday 16.07 9am (Thursday 16..07 9am PDT)]

09:00

Airflow - A beast character in the gaming world

Naresh Yegireddi, Patricio Garza

Being a pioneer for the past 25 years, SONY PlayStation has played a vital role in the Interactive Gaming Industry. Over 100+ million monthly active users, 100+ million PS-4 console sales along with thousands of game development partners across the globe, big-data problem is quite inevitable. This presentation talks about how we scaled Airflow horizontally which has helped us building a stable, scalable and optimal data processing infrastructure powered by Apache Spark, AWS ECS, EC2 and Docker.

Warsaw Meetup, [Sessions start: Friday 17.07 6 pm (Friday 17.07 9am PDT)]

10:00

Machine Learning with Apache Airflow

Daniel Imberman

This talk will discuss how to build an Airflow based data platform that can take advantage of popular ML tools (Jupyter, Tensorflow, Spark) while creating an easy-to-manage/monitor

Warsaw Meetup, [Sessions start: Friday 17.07 6 pm (Friday 17.07 9am PDT)]

11:00

Achieving Airflow Observability

Identify issues in a fraction of the time and streamline root cause analysis for your DAGs. Airflow is the leading orchestration platform for data engineers. But when running Airflow at production scale, many teams have bigger needs for monitoring jobs, creating the right level of alerting, tracking problems in data, and finding the root cause of errors. In this talk we will cover our suggested approach to gaining Airflow observability so that you have the visibility you need to be productive.

Warsaw Meetup, [Sessions start: Friday 17.07 6 pm (Friday 17.07 9am PDT)]