2025-04-24 –, 218 Workshops
This talk explores the synergy between Apache Beam and Apache Airflow, demonstrating how to create a robust, end-to-end data engineering workflow. We'll dive into the challenges of orchestrating complex data processing tasks and show how combining Airflow's scheduling capabilities with Beam's data processing framework can create more efficient and manageable data pipelines. The session will cover integration with Google Cloud Platform services, including Cloud Functions, BigQuery, and Gemini AI models.
Problem Addressed:
In today's data-driven world, organizations face the daunting challenge of orchestrating complex, end-to-end data engineering workflows that seamlessly integrate batch and streaming processing, scheduling, cloud services, and AI models. This talk tackles the often-overlooked synergy between Apache Beam and Apache Airflow, two powerful tools in the data engineering ecosystem that are rarely used in tandem. We'll explore how combining these technologies with Google Cloud Platform services and cutting-edge AI models can revolutionize data pipeline architecture.
Relevance to the Audience:
As data volumes explode and processing requirements become increasingly complex, data engineers and scientists are under pressure to build scalable, maintainable pipelines that can handle diverse data sources and downstream applications. This topic is crucial for professionals looking to:
- Modernize their data infrastructure
- Streamline machine learning pipelines
- Overcome limitations of using Beam and Airflow separately
- Integrate AI models into data workflows seamlessly
- Leverage cloud services for enhanced scalability and performance
Solutions and Key Takeaways:
Attendees will gain practical insights and hands-on knowledge to:
1. Harness the Power of Integration: Learn to seamlessly combine Apache Beam's robust data processing capabilities with Apache Airflow's sophisticated scheduling and orchestration features.
2. Master Cloud-Native Data Engineering: Discover how to leverage Google Cloud Platform services like Cloud Functions and BigQuery to build serverless, scalable data pipelines.
3. Incorporate AI into Data Workflows: Explore techniques for integrating Gemini AI models into your data processing pipelines, opening new possibilities for intelligent data transformation and analysis.
4. Design Resilient Architectures: Gain expertise in creating modular, scalable, and fault-tolerant data architectures that can handle the demands of modern data-driven applications.
Sadeeq is a Data Analytics Specialist at Google Cloud in the UK. His role involves understanding customers' Data Engineering and Analytics challenges and goals, while helping them through their Digital Transformation journeys as they leverage solutions primarily on Google Cloud Platform, as well as on-Prem or on other Clouds.
Whilst in Nigeria, Sadeeq worked as a Software Engineer at few Startups, but gradually transitioned into Data Engineering at FMDQ Group. He then moved to Portugal for an MSc. degree in Data Science and Advanced Analytics at NOVA University of Lisbon.
He previously worked at KPMG and Microsoft, and his decade of industry experience include consulting with and for other notable Fortune 500 companies on data-centric projects.