Guillaume Lemaitre EuroSciPy 2026

Guillaume Lemaitre
.ical

Guillaume is an open-source software engineer working at :probabl. He is a core maintainer of the scikit-learn and imbalanced-learn libraries.

Sessions

07-21

09:30

30min

How to use skrub Data Ops in practice

Riccardo Cappuzzo, Guillaume Lemaitre

Skrub is a package that eases preparing dataframes so they can be used in machine-learning tasks. In practice, data can be spread over multiple tables, represent various types of information (tabular, textual, graphical), or be stored on external database systems rather than dataframes.

Skrub Data Ops help with constructing versatile pipelines that can handle this variety of scenarios, while at the same time avoiding data leakage and allowing to build rich hyper-parameter grids that can be explored to maximize the performance of the final machine learning model.

In this talk, we give a brief introduction of the Data Ops framework before presenting three separate use cases highlighting their versatility: a traditional machine learning pipeline that uses Optuna to perform hyper-parameter tuning, a pipeline that trains on data stored in a relational database rather than a dataframe, and an image classification task with Pytorch.

By the end of the talk, attendees will learn about the skrub Data Ops, their main features and how they can be used successfully in different practical scenarios.

Computational Tools and Scientific Python Infrastructure

Room 1.38 (Ground Floor, Turing)

07-22

11:00

90min

Deal with imbalanced classification using scikit-learn

Guillaume Lemaitre, Anne Beyer

Class imbalance is a common challenge in real-world machine learning. This course explores why standard approaches fail and how to build reliable classifiers using scikit-learn's calibration and threshold-tuning tools.

We cover practical solutions including resampling strategies, probabilistic calibration with CalibratedClassifierCV, and decision threshold optimization using TunedThresholdClassifierCV. You'll learn to evaluate models appropriately with calibration curves and confusion matrices.

The course also addresses prevalence shift or in other words when your training data doesn't reflect the target population. We demonstrate weight-based training corrections and post-hoc probability adjustments applicable to any binary classifier.

Applied AI & LLM Technologies and Use Cases

Room 1.38 (Ground Floor, Turing)

Guillaume Lemaitre .ical

Sessions

Guillaume Lemaitre
.ical