2023-04-17 –, A1
The nightmare before data science production: You found a working prototype for your problem using a Jupyter notebook and now it's time to build a production grade solution from that notebook. Unfortunately, your notebook looks anything but production grade. The good news is, there's finally a cure!
The open-source python package LineaPy aims to automate data science workflow generation and expediting the process of going from data science development to production. And truly, it transforms messy notebooks into data pipelines like Apache Airflow, DVC, Argo, Kubeflow, and many more. And if you can't find your favorite orchestration framework, you are welcome to work with the creators of LineaPy to contribute a plugin for it!
In this talk, you will learn the basic concepts of LineaPy and how it supports your everyday tasks as a data practitioner. For this purpose, we will transform a notebook step by step together to create a DVC pipeline. Finally, we will discuss what place LineaPy will take in the MLOps universe. Will you only have to check in your notebook in the future?
The nightmare before data science production: You found a working prototype for your problem using a Jupyter notebook and now it's time to build a production grade solution from that notebook. Unfortunately, your notebook looks anything but production grade. You embark on a time-consuming journey of refactoring the notebook. You come across irrelevant and relevant code snippets that are scattered in different cells but you persevere. Midway through your journey, you realize that your refactoring is not immune from the reproducibility issues caused by deleted cells and out-of-order cell executions. We haven't even talked about the creation of a pipeline from that notebook yet! A desperate situation indeed. The good news is, there's finally a cure!
The open-source python package LineaPy aims to automate data science workflow generation and expediting the process of going from data science development to production. And truly, it transforms messy notebooks into data pipelines like Apache Airflow, DVC, Argo, Kubeflow, and many more. And if you can't find your favorite orchestration framework, you are welcome to work with the creators of LineaPy to contribute a plugin for it!
In this talk, you will learn the basic concepts of LineaPy and how it supports your everyday tasks as a data practitioner. For this purpose, we will transform a notebook step by step together to create a DVC pipeline. Finally, we will discuss what place LineaPy will take in the MLOps universe. Will you only have to check in your notebook in the future?
Intermediate
Expected audience expertise: Python:Intermediate
Abstract as a tweet:The nightmare before data science production: You found a working prototype for your problem using a Jupyter notebook and now it's time to build a production grade solution from that notebook. The good news is, there's finally a cure: The open-source python package LineaPy!
Thomas has a great fondness for science. Strictly speaking for numerics. After his doctorate, he went to the school of embedded programming. During this time he got to know and love DevOps. His enthusiasm for number crunching ultimately led him to the topic of artificial intelligence. He is currently in charge of publicly funded open source research programs. When he’s not trying to convince his colleagues to use DVC, he’s busy with MLOps, CML and his low-budget bark beetle detection drone – once you’ve done emdedded you just can’t get away from it.