, Helium [3rd Floor]
Causal inference asks the hardest question in data science: "What would have happened if things were different?" While traditional methods often rely on rigid rules, statistical tests or "black box" adjustments, Probabilistic Programming Languages (PPLs) like PyMC and NumPyro offer a transparent, flexible, and powerful lens to view these problems.
In this talk, we move beyond the standard "correlation is not causation" disclaimer. We will build a unified workflow that starts with robust A/B testing, moves to bias adjustment in observational data using multilevel models, and culminates with advanced Deep Causal Latent Variable Models (CEVAE).
Why should you use a Probabilistic Programming Language (PPL) for Causal Inference? Because causal problems are inherently about uncertainty and structure—two things PPLs handle natively.
In this session, we will demonstrate how to translate causal diagrams (DAGs) directly into code, using PyMC and NumPyro to estimate causal effects with rigorous uncertainty quantification. We will cover three distinct levels of complexity, drawing on real-world examples and recent research:
The "Simple" Case: Enhancing A/B Tests Even in randomized experiments, PPLs provide massive value. We will show how to:
Use Prior Predictive Checks to prevent "silly" estimates (Twyman's Law) by incorporating domain knowledge into priors (e.g., preventing the model from predicting a 1000% lift). We also describe how to perform a power analysis in a Bayesian framework.
Implement Bayesian CUPED to reduce variance and increase statistical power without collecting more data. We can combine these variance-reduction methods with smarter priors as described above.
The Observational Challenge: Confounding & Structure When we can't randomize, we must adjust. We will explore (through concrete examples):
Backdoor Adjustment: Show how PPLs implement the "do-operator" to estimate Average Treatment Effects (ATE) in the presence of observed confounders.
Multilevel Causal Models: Demonstrate how to use multilevel models to account for time-invariant unobserved confounders. We discuss the pros and cons compared with similar methods, such as fixed effects.
The Frontier: Deep Latent Variable Models: What if confounders are unobserved? We will introduce advanced methods combining Deep Learning with Probabilistic Programming:
- An introduction to the Causal Effect Variational Autoencoder (CEVAE).
By the end of this talk, you will understand how to view causal inference not as a collection of isolated statistical tricks, but as a coherent modeling process powered by probabilistic programming.
References
- A/B Testing & Priors: Prior Predictive Checks for Metric Lift & Power Analysis
- Variance Reduction: Bayesian CUPED
- Observational Data: Introduction to Causal Inference with PyMC
- Hierarchical Models: Multilevel Causal Inference
- CEVAE Paper: Louizos, C., Shalit, U., Mooij, J., Sontag, D., Zemel, R., & Welling, M. (2017). Causal Effect Inference with Deep Latent-Variable Models.
- Code Reference: Adapting concepts from CausalML (Robert Osazuwa Ness), specifically Chapter 11: Bayesian Causal Graphical Inference.
Mathematician (Ph.D., Humboldt Universität zu Berlin) and data scientist. I am interested in interdisciplinary applications of mathematical methods, particularly time series analysis, Bayesian methods, and causal inference. Active open source developer (PyMC, PyMC-Marketing, and NumPyro, among others). For more info, please visit my personal website https://juanitorduz.github.io