Staying Alert: How to Implement Continuous Testing for Machine Learning Models
2023-04-17 , A1

Proper monitoring of machine learning models in production is essential to avoid performance issues. Setting up monitoring can be easy for a single model, but it often becomes challenging at scale or when you face alert fatigue based on many metrics and dashboards.

In this talk, I will introduce the concept of test-based ML monitoring. I will explore how to prioritize metrics based on risks and model use cases, integrate checks in the prediction pipeline and standardize them across similar models and model lifecycle. I will also take an in-depth look at batch model monitoring architecture and the use of open-source tools for setup and analysis.


Have you ever deployed a machine learning model in production only to realize that it wasn't performing as well as you thought it would, or was late to detect a model performance drop due to corrupted data? Proper monitoring can help avoid it. Typically, this involves checking the quality of the input data, monitoring the model's responses, and detecting any changes that might lead to model quality drops.

However, setting up monitoring is often easier said than done. First, while it is easy to write a few assertions for data quality checks or track accuracy for a single model you created, it is much more challenging to do so consistently and at scale as the number of models, pipelines, and the volume of data increases. Second, building monitoring dashboards to track many metrics often leads to alert fatigue and does not help with root cause analysis of the problem.

In this talk, I will introduce the idea of test-based ML monitoring and how it can help you keep your models in check in production. I will cover the following:
- The difference between testing and monitoring and when one is better than other
- How to prioritize metrics and tests for each model based on risks and model use cases
- How to integrate checks in the model prediction pipeline and standardize them across similar models and model lifecycle
- An in-depth look at batch model monitoring architecture, including setup and analysis of results using open-source tools


Expected audience expertise: Domain

Intermediate

Expected audience expertise: Python

Novice

Abstract as a tweet

ML monitoring might be easy for a single model, but hard at scale. In this talk, I will introduce the idea of test-based monitoring, and how to standardize data and model checks across models and lifecycle.

Emeli Dral is a Co-founder and CTO at Evidently AI, a startup developing open-source tools to evaluate, test, and monitor the performance of machine learning models.

Earlier, she co-founded an industrial AI startup and served as the Chief Data Scientist at Yandex Data Factory. She led over 50 applied ML projects for various industries - from banking to manufacturing. Emeli is a data science lecturer at GSOM SpBU and Harbour.Space University. She is a co-author of the Machine Learning and Data Analysis curriculum at Coursera with over 100,000 students.