scikit-learn and fairness, tools and challenges
2022-09-01 , HS 120

Fairness, accountability, and transparency in machine learning have become a major part of the ML discourse. Since these issues have attracted attention from the public, and certain legislation are being put in place regulating the usage of machine learning in certain domains, the industry has been catching up with the topic and a few groups have been developing toolboxes to allow practitioners incorporate fairness constraints into their pipelines and make their models more transparent and accountable. Some examples are fairlearn, AIF360, LiFT, fairness-indicators (TF), ...

This talk explores some of the tools existing in this domain and discusses work being done in scikit-learn to make it easier for practitioners to adopt these tools.


On the machine learning side, scikit-learn has been one of the most commonly used libraries which has been extended by third party libraries such as imbalanced-learn and scikit-lego. However, when it comes to incorporating fairness constraints in a usual scikit-learn pipeline, there are challenges and limitations related to the API, which has made developing a scikit-learn compatible fairness focused package challenging and hampering the adoption of these tools in the industry.

In this talk, we start with a common classification pipeline, then we assess fairness/bias of the data/outputs using disparate impact ratio as an example metric, and finally mitigate the unfair outputs and search for hyperparameters which give the best accuracy while satisfying fairness constraints.

This workflow will expose the limitations of the API related to passing around feature names and/or sample metadata in a pipeline down to the scorers. We discuss certain workarounds and then talk about the work being done to address these issues and show how the final solution would look like. After this talk, you will be able to follow the related discussions happening in these open source communities and know where to look for them.

The code and the presentation will be publicly available on github.


Public link to supporting material:

https://github.com/adrinjalali/talks

Abstract as a tweet:

What we're doing in scikit-learn to help people use fairness related tools in their day to day tasks.

Domains:

Machine Learning, Open Source Library

Expected audience expertise: Domain:

some

Expected audience expertise: Python:

some

I'm a computer scientist / bioinformatician who has turned to be a core developer of scikit-learn and fairlearn, and work as a Machine Learning Engineer at Hugging Face. I'm also an organizer of PyData Berlin.

These days I mostly focus on aspects of machine learning and tools which help with creating more ethical and fair decision making systems. This trend has influenced me to work on fairlearn, and to work on aspects of scikit-learn which would help tools such as fairlearn to work more fluently with the package; and at Hugging Face, my focus is to enable the community of these libraries to be able to share their models more easily and be more open about their work.