EuroSciPy 2026

Deal with imbalanced classification using scikit-learn
2026-07-22 , Room 1.38 (Ground Floor, Turing)

Class imbalance is a common challenge in real-world machine learning. This course explores why standard approaches fail and how to build reliable classifiers using scikit-learn's calibration and threshold-tuning tools.

We cover practical solutions including resampling strategies, probabilistic calibration with CalibratedClassifierCV, and decision threshold optimization using TunedThresholdClassifierCV. You'll learn to evaluate models appropriately with calibration curves and confusion matrices.

The course also addresses prevalence shift or in other words when your training data doesn't reflect the target population. We demonstrate weight-based training corrections and post-hoc probability adjustments applicable to any binary classifier.


Class imbalance is a common challenge in real-world machine learning. This course explores why standard approaches fail and how to build reliable classifiers using scikit-learn's calibration and threshold-tuning tools.

We cover practical solutions including resampling strategies, probabilistic calibration with CalibratedClassifierCV, and decision threshold optimization using TunedThresholdClassifierCV. You'll learn to evaluate models appropriately with calibration curves and confusion matrices.

The course also addresses prevalence shift or in other words when your training data doesn't reflect the target population. We demonstrate weight-based training corrections and post-hoc probability adjustments applicable to any binary classifier.


Expected audience expertise: Domain: some Expected audience expertise: Python: some Supporting material: Supporting material Your relationship with the presented work/project: Original author or co-author

Guillaume is an open-source software engineer working at :probabl. He is a core maintainer of the scikit-learn and imbalanced-learn libraries.

This speaker also appears in:

I'm an open-source software developer with a background in computational linguistics and a contributor to scikit-learn.