Controlling a confounding effect in predictive analysis.
09-04, 16:30–16:45 (UTC), Track 2 (Baroja)

Confounding effects are often present in observational data: the effect or association studied is observed jointly with other effects that are not desired.


For instance, when predicting the salary to offer given the descriptions of professional experience, the risk is to capture indirectly a gender bias present in the distribution of salaries. Another example is found in biomedical applications, where for an automated radiology diagnostic system to be useful, it should use more than socio-demographic information to build its prediction.

Here I will talk about confounds in predictive models. I will review classic deconfounding techniques developed in a well-established statistical literature, and how they can be adapted to predictive modeling settings. Departing from deconfounding, I will introduce a non-parametric approach –that we named “confound-isolating cross-validation”– adapting cross-validation experiments to measure the performance of a model independently of the confounding effect.

The examples are mentioned in this work are related to the common issues in neuroimage analysis, although the approach is not limited to neuroscience and can be useful in another domains.


Project Homepage / Git

https://github.com/darya-chyzhyk/confound_prediction

Project Homepage / Git Abstract as a tweet

“confound-isolating cross-validation”– adapting cross-validation experiments to measure the performance of a model independently of the confounding effect #nonparametric

Python Skill Level

basic

Domain Expertise

none

Domains

Machine Learning, Medicine/Health, Statistics

I’m Darya, researcher in artificial intelligence and machine learning, in particular feature selection, clustering, pattern recognition, segmentation and statistical analysis. During the last years I have been working on computer aided diagnostic systems for brain diseases that allow identification of the anatomical location of image biomarkers, lesion segmentation and phenotype prediction.