PyConDE & PyData Berlin 2024

From idea to production in a day: Leveraging Azure ML and Streamlit to build and user test machine learning ideas quickly
04-22, 13:45–14:15 (Europe/Berlin), B09

Getting a machine learning solution in front of users usually takes some time. The data science tech stack is full of time traps and infrastructure issues might slow down deployment. The Azure Machine Learning platform, automated machine learning, and Streamlit are predestined tools for circumventing common development and deployment issues – if you know how to use them. Based on our learnings in corporate hackathons, we will use the stack to rapidly prototype a computer vision application users can interact with. You will walk away with Python code snippets and inspiration to build and user test your own machine learning ideas quickly.


Experimentation, bringing machine learning ideas in front of users, is essential to innovation. Yet, in our corporate hackathons, our data science team has struggled many times with how to build and deploy user-facing machine learning ideas in just a single day.

Over the past 2+ years, we have developed a routine around using Azure Machine Learning, automated machine learning, and Streamlit to build and user test machine learning ideas quickly. The aim of this talk is to pass on practical, technical knowledge to fellow data scientists about how to leverage this stack to achieve high build and user test speeds.

During the talk, we will walk through the process of building a computer vision system for identifying trash in images via an app using the open-source TACO dataset (http://tacodataset.org/). Working through a Jupyter notebook, we will load the data into Azure Machine Learning and trigger an automated machine learning run on the data. In this context, we will quickly get to know the training and testing metrics available in Azure ML to evaluate the model. We will then download the machine learning model as a file packaged in the open-source ONNX format (https://onnx.ai/). Using the open-source Python web application framework Streamlit (https://github.com/streamlit/streamlit), we will program an application in which users can upload images and embed the machine learning model in it to identify trash in these images. Using a to-be-published infrastructure-as-code pipeline on Azure DevOps, we will deploy the application to the public internet on the Azure platform. From here, users can test it.

The stack and code presented in this talk will enable fellow data scientists to accelerate their data science development, leading to quicker experimentation and, therefore, to faster innovation of products with machine learning at their core.


Expected audience expertise: Domain

Intermediate

Expected audience expertise: Python

Intermediate

Abstract as a tweet (X) or toot (Mastodon)

How to leverage Azure ML, automated machine learning, and Streamlit to build and test machine learning apps quickly? Find out about our favorite Hackathon stack and walk away with some code to build and user-test your own machine learning ideas fast.

Public link to supporting material, e.g. videos, Github, etc.

https://github.com/flrs/build_and_test_ml_quickly

Florian is a Sr. Data Scientist at Henkel where he develops machine learning solutions for R&D and production use cases across the company's adhesive and consumer good portfolios. He is also known as online instructor for the open-source data engineering framework Apache Spark. Florian volunteers his time as the current Vice President of the Affiliated Project Selection Committee at NumFOCUS, helping scientific open-source projects grow.