2020-07-13 –, London Meetup, [Sessions start: Monday 13.07 5pm (Monday 13.07 9am PDT)]
This talk describes how Airflow is utilized in an Autonomous driving project, originating from Munich - Germany. We describe the Airflow setup, what challenges we encountered and how we maneuvered to achieve a distributed and highly scalable Airflow setup.
One of the biggest automotive manufacturers elected to go for Airflow as an orchestration tool, in the pursuit of producing their first Level-3 autonomous driving vehicle in Germany.
In this talk, we will describe the journey of deploying Airflow on top of OpenShift using a PostgreSQL database + RabbitMQ. We will describe how we achieve high-availability for the different Airflow components. We will tackle issues related to the database performance and failover recovery for the different Airflow components in our setup. In addition, we will present the bottlenecks we encountered with (1) Airflow scheduler (especially with complex DAGs), and (2) SparkSubmitOperator. For both topics, we will describe how we mitigated them. We will also describe how we leverage OpenShift to dynamically scale our Airflow deployment based on the running workloads.
The talk will be concluded with a brief overview of future requirements and beneficial features we believe will be helpful for the community.
Experienced Solutions Architect with a demonstrated history of working in a variety of domains such as High-Performance Computing, Big Data Analytics, Data Engineering, Software Engineering, Platform Integration. 12+ years of multi-national experience with a strong academic background.
Big Data Engineer @ DXC Technology. 5+ years of experience as data engineer with strong technical background.