Production-level data pipelines that make everyone happy using Kedro PyConDE & PyData Berlin 2019

Production-level data pipelines that make everyone happy using Kedro

Learn how easy it is to apply software engineering principles to your data science and data engineering code. Expect an overview of Kedro, a library that implements best practices for data pipelines with an eye towards productionizing ML models.

Objective

This talk will tell a story of how changing business objectives are driving interest in production-level code; what software principles data engineers and data scientists should consider applying to their code to make it easier to deploy into the production environment; and, how they can use an open source Python library, called Kedro, to simplify their workflow using our Spaceflights example.

Content will be presented at a high-level and we want the audience of data engineers and data scientists to walk out of the session understanding why it's important to master the suggested techniques and know how to start applying them today.

Outline

I. Production-level code makes everyone happy, except me (5 min)

Business objectives are changing, companies and stakeholders want code that creates continuous value
Challenges you will face while trying to create production-level code on your own

II. What is a production-level data pipeline? (5 min)

Definitions for production-level code and data pipelines
Coverage of the software engineering principles that should be applied to create data pipelines

III. What tools can I use to apply these principles? (5 min)

Present the existing tool landscape
Show how everything fits in Kedro, a workflow development framework that makes it easy to produce data pipelines that are robust, scalable, deployable and repeatable

IV. Can you show me an example of how Kedro works? (15 min)

View functionality of Kedro using the Spaceflights ML problem
Visualise the Spaceflights data pipeline with Kedro-Viz
Deploy Kedro pipelines with Kedro-Docker and Kedro-Airflow

VI. Q&A (5 min)

Domains: Data Science, DevOps, Machine Learning, Data Engineering Domain Expertise: some Public link to supporting material:

https://github.com/quantumblacklabs/kedro/

Python Skill Level: basic Abstract as a tweet:

Yetunde Dada

Yetunde Dada is a Product Manager at QuantumBlack. She works with an incredible team to use software to solve problems for data engineers and data scientists. Prior to QuantumBlack, Yetunde worked as a Data Product Manager at Barclays, where she worked on a variety of analytics products and projects. She has an MBA from the Said Business School, University of Oxford and BEng Mechanical Engineering degree from the University of Pretoria.