Optimizing inference for state of the art python models
08-31, 14:35–14:50 (Europe/Zurich), HS 118

This talk will take state of the art python models and show how, through advanced inference techniques, we can drastically increase the performance of the models at runtime. You’ll learn about the open source MLServer project and see live how easily it helps serve python-based machine learning models.


Machine learning models are often created with an emphasis on how they run during training but with little regard for how they’ll perform in production. In this talk, you’ll learn what those issues are and how to address them using some state of the art models as an example. We’ll introduce the open source project, MLServer, and look at how features, such as multi-model serving and adaptive batching, can optimize performance for your models. Finally, you’ll learn how using an inference server locally can speed up the time to deployment when moving to production.


Public link to supporting material

https://github.com/SeldonIO/MLServer

Abstract as a tweet

This talk will take state of the art python models and show how, through advanced inference techniques, we can drastically increase the performance of the models at runtime.

Project Homepage / Git

https://github.com/SeldonIO/MLServer

Domains

Machine Learning

Expected audience expertise: Domain

some

Expected audience expertise: Python

some

Ed comes from a cloud computing background and is a strong believer in making deployments as easy as possible for developers. With an education in computational modelling and an enthusiasm for machine learning, Ed has blended his work in ML and cloud native computing together to cement himself firmly in the emerging field of MLOps. Organiser of Tech Ethics London and MLOps London, Ed is heavily involved in lots of developer communities and, thankfully, loves both beer and pizza.