Guillaume Lemaitre
Sessions
This tutorial explores scikit-image, the numpy-native library in the scientific python ecosystem, for visual data analysis and manipulation.
Designed for beginners and advanced users, it empowers image analysis skills and offers insights into scikit-image documentation.
It covers basic concepts like image histogram, contrast, filtering, segmentation, and descriptors through practical exercises.
The tutorial concludes with advanced performance optimization techniques.
Familiarity with numpy arrays is essential as it the underlying data representation.
When operating a classifier in a production setting (i.e. predictive phase), practitioners are interested in potentially two different outputs: a "hard" decision used to leverage a business decision or/and a "soft" decision to get a confidence score linked to each potential decision (e.g. usually related to class probabilities).
Scikit-learn does not provide any flexibility to go from "soft" to "hard" predictions: it uses a cut-off point at a confidence score of 0.5 (or 0 when using decision_function
) to get class labels. However, optimizing a classifier to get a confidence score close to the true probabilities (i.e. a calibrated classifier) does not guarantee to obtain accurate "hard" predictions using this heuristic. Reversely, training a classifier for an optimum "hard" prediction accuracy (with the cut-off constraint at 0.5) does not guarantee obtaining a calibrated classifier.
In this talk, we will present a new scikit-learn meta-estimator allowing us to get the best of the two worlds: a calibrated classifier providing optimum "hard" predictions. This meta-estimator will land in a future version of scikit-learn: https://github.com/scikit-learn/scikit-learn/pull/26120.
We will provide some insights regarding the way to obtain accurate probabilities and predictions and also illustrate how to use in practice this model on different use cases: cost-sensitive problems and imbalanced classification problems.