2025-04-23 –, Titanium3
Getting your model into production isn’t a trivial task, but it’s only half the battle. Ensuring that your model continues to deliver great performance over time is even more critical. In this talk I would like to present a selection of what can kill your model’s performance, zoom in on multivariate data drift, and present two methods to detect this type of drift in your production data.
Objectives: The goal of this talk is to explain what multivariate data drift is and to present methods for detecting it. You will learn about the challenges associated with multivariate drift and explore practical solutions to identify it
-
Model’s Aren’t Forever:
91% of models in production degrade over time. I will explain the importance of continuous model monitoring and discuss common approaches and pitfalls. -
All Ways Your Perfectly Fine ML Model Can Fail:
Machine learning models can fail for numerous reasons. I will describe the root causes of ML model failure, including data quality, data drift and the final boss of every model - concept drift.
I will explore simple methods to detect univariate data drift and delve into the more complex challenge of concept drift, which can significantly impact your model’s performance in production. -
Drift Happens - Multivariate Data Drift:
Multivariate data drift occurs when the relationships between multiple variables in your data change over time. This type of drift can be challenging to detect with standard methods. I will provide an explanation of what multivariate data drift is and why traditional techniques may fall short in identifying it. -
Two Clever Ways to Detect Multivariate Data Drift:
Domain Classifier: This method involves training a classifier to distinguish between data from different time periods. If the classifier can accurately separate the data, it indicates that the distribution has changed, signaling potential drift.
PCA Reconstruction Error: Principal Component Analysis (PCA) can be used to reduce the dimensionality of your data. By comparing the reconstruction error over time, you can detect changes in the underlying data distribution that may indicate drift.
Intermediate
Expected audience expertise: Python:Intermediate
data & ml fan with a soft spot for OSS - driven by curiosity, eager self-learner, hackathons enjoyer, PyData and PyLadiesCon speaker and volunteer, exploring and creating content about post-deployment data science at NannyML, in my free time contributing to Narwhals and hosting open source sprints