PyConDE & PyData Berlin 2024

Beyond Deployment: Exploring Machine Learning Inference Architectures and Patterns
2024-04-24 , B07-B08

This talk is about setting up robust and scalable machine learning systems for high-throughput real-time predictions and large numbers of users. It is meant for ML engineers and people who work with data and want to learn more about MLOps focusing on cloud-based platforms. The focus of this talk will be about different ways to make predictions -– real-time, asynchronously and batch processing. It discusses the advantages and disadvantages of the different patterns and highlights the importance of choosing the right pattern for specific use cases, including generative large language models

We will use examples from StepStone's production systems to illustrate how to build systems that scale to thousands of simultaneous requests while delivering low-latency, robust predictions.

I will cover some of the technical details, how to efficiently manage operations, and real-life examples in a way that is easy to understand and informative. You will learn about different setups for ML and how to make them work. This will help you make your ML inference faster, more cost-efficient, and reliable.


This talk explains the major challenges of ML deployment and management, emphasizing inference patterns for robust, scalable applications. Using StepStone's infrastructure as an example, we'll discuss efficiently handling large workloads and complex models, including recent large language models, to ensure fast, cost-effective, and reliable results.

The session begins with an introduction, highlighting the significance of ML inference and outlining the objective of providing insights into effective MLOps strategies. We'll then overview various ML inference patterns, emphasizing their advantages, disadvantages, and the importance of selecting the right pattern for specific use cases.

Moving on, we'll delve into StepStone's ML inference strategy, showcasing real-world applications and how scalability, performance, and cost are managed while maintaining agility for frequent model updates and monitoring in production systems.

In summary, this talk provides a practical roadmap of ML inference patterns with a focus on real-world implementation at StepStone.


Expected audience expertise: Domain:

Intermediate

Expected audience expertise: Python:

Novice

Abstract as a tweet (X) or toot (Mastodon):

"Beyond Deployment: Exploring Machine Learning Inference Architectures and Patterns" - uncover the ML inference strategies that power StepStone's success and learn to scale your models with confidence!

Tim is a Staff Machine Learning Engineer at Stepstone. He is working on the deployment of various machine learning projects.