Gaël Varoquaux is a research director working on data science and health at Inria (French Computer Science National research). His research focuses on using data and machine learning for scientific inference, with applications to health and social science, as well as developing tools that make it easier for non-specialists to use machine learning. He has been working going building easy-to-use open-source software in Python for above 15 years. He is a core developer of scikit-learn, joblib, Mayavi and nilearn, a nominated member of the PSF, and often teaches scientific computing with Python, eg as a creator of the scipy lecture notes.
@GaelVaroquauxInstitute / Company –
This tutorial will guide towards good evaluation of machine-learning models, choosing metrics and procedures that match the intended usage, with code examples using the latest scikit-learn's features. We will discuss how good metrics should characterize all aspects of error, e.g. on the positive and negative class; the probability of a detection, or the probability of a true event given a detection; as they may need to catter for class imbalance. Metrics may also evaluate confidence scores, e.g. calibration. Model-evaluation procedures should gauge not only the expected generalization performance, but also its variations.
This talk will cover how to build predictive models that handle well missing values, using scikit-learn. It will give on the one side the statistical considerations, both the classic statistical missing-values theory and the recent development in machine learning, and on the other side how to efficiently code solutions.