PyConDE & PyData Berlin 2024

Using ML to find out the "Why"? A Tutorial in Causal Machine Learning
2024-04-23 , A03-A04

Machine learning is mostly used for predicting outcome variables. But in many cases, we are interested in causal questions: Why do customers churn? What is the effect of a price change on sales? How can we optimize personalized marketing campaigns or medical treatments?

This tutorial introduces participants to the field of Causal Machine Learning (Causal ML). We will start with a basic motivation of causal analysis and share insights on how to recognize causal questions in data science. We will dive into the basics of Causal ML: Why can't we simply use of-the-shelf ML methods to answer causal questions? The tutorial will focus on the Double Machine Learning approach and demonstrate the use of Causal ML with the Python library DoubleML (Bach et al., 2022). The general introduction will be complemented by hands-on data examples and interactive discussion and Q&A sessions. The tutorial is a great starting point for participants to discover Causality/Causal ML and start their own causal data science projects.

References

Bach, P., Chernozhukov, V., Kurz, M. S., and Spindler, M. (2022), DoubleML - An Object-Oriented Implementation of Double Machine Learning in Python, Journal of Machine Learning Research, 23(53): 1-6, https://www.jmlr.org/papers/v23/21-0862.html


The tutorial will be organized in three blocks.

1) Introduction and motivation

We will point out why Causality matters in data science. Many problems managers and data scientists are facing are causal. When organizations and companies want to optimize their marketing campaigns, their financial planning, pricing scheme they usually run into causal considerations: How much do my sales decrease if we increase the price by X%? How can I send out email newsletters to those who like them and avoid to annoy other subscribers?

Causal Inference and Causal ML offer powerful tools that help to formalize and model things that are usually discussed only on an intuitive basis: Are the people who opened my newsletters really comparable to those who haven't? Can I just compare the convergence rates of these groups when I want to evaluate the newsletters's effectiveness?

2) Introduction to Causal Machine Learning with DoubleML

Causal Machine Learning offers tools to estimate causal relationships with SOTA ML algorithms. We will offer an introduction to the Double Machine Learning approach (Chernozhukov et al., 2018). This introduction will be aligned with several data examples and code demonstrations using the Python package DoubleML, https://docs.doubleml.org/stable/index.html . DoubleML is an open source package that offers various tools to estimate causal effects, for example for estimation of heterogeneous treatment effects (like in personalized marketing or personalized medicine).

3) Hands-on Session: Data Example

The tutorial featues a data projects that participants can solve on their own. With the hands-on session participants already get started on their own Causality learning journey :) Participants are invited to apply DoubleML to their own data example and play around with the package features. The hands-on session will follow the structure of the DoubleML workflow, which guides analysts through the process of causal inference with DoubleML, https://docs.doubleml.org/stable/workflow/workflow.html.

4) Discussion and Q&A

The tutorial conlcudes with a discussion and Q&A session. We are looking forward to participants' comments and ideas. We appreciate fedback of the Python community on the DoubleML package :)


Expected audience expertise: Domain:

Novice

Expected audience expertise: Python:

Intermediate

Public link to supporting material, e.g. videos, Github, etc.:

Slides: https://trainings.doubleml.org/trainings_materials/2024_pycon/index.html ; DoubleML Documentation: https://docs.doubleml.org/stable/index.html

Abstract as a tweet (X) or toot (Mastodon):

Tutorial on Causal Machine Learning by the developers of the DoubleML package for Python. Learn how to address "Why?" questions with ML! https://docs.doubleml.org/stable/index.html #Causality #CausalML #DoubleML #CausalInference

I am a PhD candidate at the University of Hamburg, passionately researching within the field of Causal Machine Learning. As part of my research activities, I am also a contributing developer to DoubleML, which is a toolbox for causal predictions with ML.

My name is Jan and I work as a research associate at the University of Hamburg, where I am studying for my PhD in statistics and data science. I have a master's degree in industrial engineering and together with my experience from industry, I have a strong application-oriented background.
I have contributed to the DoubleML package for Python and my research focuses on Causal ML for unstructured data such as text and images.