Anne Beyer
I'm an open-source software developer with a background in computational linguistics and a contributor to scikit-learn.
she/her
Session
Class imbalance is a common challenge in real-world machine learning. This course explores why standard approaches fail and how to build reliable classifiers using scikit-learn's calibration and threshold-tuning tools.
We cover practical solutions including resampling strategies, probabilistic calibration with CalibratedClassifierCV, and decision threshold optimization using TunedThresholdClassifierCV. You'll learn to evaluate models appropriately with calibration curves and confusion matrices.
The course also addresses prevalence shift or in other words when your training data doesn't reflect the target population. We demonstrate weight-based training corrections and post-hoc probability adjustments applicable to any binary classifier.