2024-08-26 –, Room 5
Data scientists are repeatedly told that it is absolutely critical to align their model training methodology with a specific business objective. While being a rather good advice, it usually falls short on details on how to achieve this in practice.
This hands-on tutorial aims to introduce helpful theoretical concepts and concrete software tools to help them bridge this gap. This method will be illustrated on a worked practical use case: optimizing the operations of a fraud detection system for a payment processing platform.
More specifically, we will introduce the concepts of calibrated probabilistic classifiers, how to evaluate them and fix common causes of mis-calibration. In a second part, we will explore how to turn probabilistic classifiers into optimal business decision makers.
The tutorial material is available at the following URL: https://github.com/probabl-ai/calibration-cost-sensitive-learning
Detailed outline of the tutorial:
- Introduction
- Evaluting ML based predictions with:
- ranking metrics,
- probabilistic metrics,
- decision metrics.
- Proper scoring losses and their decomposition in:
- calibration loss,
- grouping loss,
- irreducible loss.
- Evaluting ML based predictions with:
- Part I: Probabilistic classification
- The calibration curve
- Possible causes of miscalibration
- Model misspecification
- Overfitting and bad level of regularization
- Possible ways to improve calibration
- Non-linear feature engineering to avoid misspecification
- Post-hoc calibration with Isotonic regression
- Tuning parameters and early stopping with a proper-scoring rule
- Part II: Optimal decision making under uncertainty
- Defining a custom business cost functions
- Individual-specific cost functions
- Setting the Elkan-optimal threshold with
FixedThresholdClassifier
- Cost-sensitive learning for arbitrary cost functions with
TunedThresholdClassifierCV
- Predict-time decision threshold optimization.
This tutorial will be delivered as a set of publicly available Jupyter notebooks under an open source license.
We will mostly use components of the latest version of the scikit-learn library + a few custom extensions.
The tutorial material is available at the following URL: https://github.com/probabl-ai/calibration-cost-sensitive-learning
Probabilistic classification and cost-sensitive learning with scikit-learn. Learn the power of hparam tuning with proper scoring rules and optimal decision thresold tuning on custom business rules.
Category [Machine and Deep Learning] –Supervised Learning
Expected audience expertise: Domain –expert
Expected audience expertise: Python –some
Public link to supporting material –https://scikit-learn.org/stable/auto_examples/model_selection/plot_cost_sensitive_learning.html
Project Homepage / Git –I'm an open source software engineer at :probabl. I'm a core developer of scikit-learn
and `imbalanced-learn.
Olivier is a software engineer at Probabl and a core contributor to the scikit-learn open source Machine Learning library.