Migrating Airflow-based Spark jobs to Kubernetes - the native way
07-07, 23:00–23:45 (US/Pacific), Bangalore Meetup [Session starts Wednesday 8.07 9.30 am (Tuesday 07.07. 9pm PDT) ]

At Nielsen Identity Engine, we use Spark to process 10’s of TBs of data. Our ETLs, orchestrated by Airflow, spin-up AWS EMR clusters with thousands of nodes per day. In this talk, we’ll guide you through migrating Spark workloads to Kubernetes with minimal changes to Airflow DAGs, using the open-sourced GCP Spark-on-K8s operator and the native integration we recently contributed to the Airflow project.


When we initially started running Spark on AWS, it wasn’t even supported OOTB by EMR, and we used Data Pipeline to manage our pipelines.
Today, we’re spinning-up dozens of clusters on a daily basis, all orchestrated by Airflow.
Recently, we embarked on a journey to evaluate the option of using Kubernetes as our Spark infrastructure (mainly to reduce operational costs and improve stability).
To allow us to achieve those goals, we combined the open-sourced GCP Spark-on-K8s operator with a native integration we developed and contributed back to Airflow (AIRFLOW-6542).
Finally, we were able to migrate our existing Airflow DAGs, with minimal changes, from AWS EMR to K8s.
In this talk, we’ll guide you through migrating your Airflow-based Spark workloads to K8s, including:
* Challenges with existing Spark infrastructure and the motivation to migrate to K8s
* Building and contributing a new Airflow integration from scratch
* Best practices for using Airflow as the orchestrator

Roi Teveth is a big data engineer at Nielsen Identity Engine, where he specializes in research and development of solutions for big data infrastructure using cutting-edge technologies such as Spark, Kubernetes and Airflow. Roi has a vast system engineering background and is a CNCF certified Kubernetes administrator.

Itai Yaffe is a big data tech lead at Nielsen Identity Engine, where he deals with big data challenges using tools like Spark, Druid, Kafka, and others. He is also a part of the Israeli chapter's core team of Women in Big Data. Itai is keen about sharing his knowledge and has presented his real-life experience in various forums in the past.