Learn how easy it is to apply software engineering principles to your data science and data engineering code. Expect an overview of Kedro, a library that implements best practices for data pipelines with an eye towards productionizing ML models.
Objective
This talk will tell a story of how changing business objectives are driving interest in production-level code; what software principles data engineers and data scientists should consider applying to their code to make it easier to deploy into the production environment; and, how they can use an open source Python library, called Kedro, to simplify their workflow using our Spaceflights example.
Content will be presented at a high-level and we want the audience of data engineers and data scientists to walk out of the session understanding why it's important to master the suggested techniques and know how to start applying them today.
Outline
I. Production-level code makes everyone happy, except me (5 min)
- Business objectives are changing, companies and stakeholders want code that creates continuous value
- Challenges you will face while trying to create production-level code on your own
II. What is a production-level data pipeline? (5 min)
- Definitions for production-level code and data pipelines
- Coverage of the software engineering principles that should be applied to create data pipelines
III. What tools can I use to apply these principles? (5 min)
- Present the existing tool landscape
- Show how everything fits in Kedro, a workflow development framework that makes it easy to produce data pipelines that are robust, scalable, deployable and repeatable
IV. Can you show me an example of how Kedro works? (15 min)
- View functionality of Kedro using the Spaceflights ML problem
- Visualise the Spaceflights data pipeline with Kedro-Viz
- Deploy Kedro pipelines with Kedro-Docker and Kedro-Airflow
VI. Q&A (5 min)
Data Science, DevOps, Machine Learning, Data Engineering
Domain Expertise:some
Python Skill Level:basic
Abstract as a tweet:Learn how easy it is to apply software engineering principles to your data science and data engineering code. Expect an overview of Kedro, a library that implements best practices for data pipelines with an eye towards productionizing ML models.
Public link to supporting material: