2026-06-06 –, Hardwick Hub
Your company launches a loyalty program — but not everywhere at once. Ten stores get it in January, another ten in March, the rest later. Leadership asks: "Did it work? By how much?" You compare before and after... and get a number that's wrong. Phased rollouts break naive pre/post comparisons, and standard regression quietly gives misleading answers.
This talk shows a practical Python workflow for getting it right. Using a realistic store-rollout example and CausalPy (an open-source library), I'll demonstrate how to produce event-study plots that show when and how much an intervention takes effect — with uncertainty estimates your stakeholders can actually act on. Whether you're measuring feature flags, marketing campaigns, or policy changes, you'll leave with a reproducible notebook and a step-by-step workflow you can apply tomorrow.
Who this is for
Data scientists, analysts, and applied ML/measurement practitioners who evaluate interventions using observational or quasi-experimental data (e.g., feature flags, phased launches, regional changes). Familiarity with pandas and basic regression is helpful; no prior Bayesian experience required.
What attendees will learn (takeaways)
- How staggered adoption differs from "textbook" two-period Difference-in-Differences, and why the difference matters in production measurement.
- How the imputation-based estimator (Borusyak, Jaravel & Spiess, 2024) works: fit on untreated observations, predict counterfactuals, aggregate by event time.
- How to turn model output into stakeholder-friendly language: probability of positive effect, expected uplift, decision thresholds — no Bayesian background needed.
- The parameter recovery pattern: validate your method on simulated data with known truth before trusting it on real data.
- Practical diagnostics and red flags: parallel trends, anticipation effects, spillovers, and when not to use this method.
Outline and time plan (30 min talk + 10 min Q&A)
- 0–4 min: The real-world problem — phased rollouts and why naive pre/post comparisons fail
- 4–10 min: DiD refresher, then what breaks under staggered adoption (timing heterogeneity, negative weighting in TWFE)
- 10–17 min: The staggered DiD solution (event-time framing, imputation intuition, key assumptions)
- 17–25 min: Worked example in Python with CausalPy
- A loyalty program rolled out to 60 stores in 3 waves over 30 weeks
- Visualise adoption timing and check pre-trends
- Fit the model and produce event-study plots
- Parameter recovery: compare estimated effects to known ground truth
- 25–28 min: Diagnostics — pre-treatment placebo checks, counterfactual inspection, "when not to use this" decision checklist
- 28–30 min: Summary — three takeaways and the six-step workflow
- 30–40 min: Q&A
Background knowledge needed
- Comfortable with tidy data, grouping/aggregating, and reading a regression coefficient.
- Basic causal inference vocabulary (treatment/control, confounding) is helpful but not required.
What I will provide
A public GitHub repository containing:
- a reproducible Quarto notebook (the slides themselves, with all code),
- a synthetic dataset simulating a realistic store loyalty program rollout,
- and environment setup instructions (conda environment file).
Ben Vincent is Director of InferenceWorks Ltd and a Principal Data Scientist at PyMC Labs, where he has been building Bayesian solutions for real-world business problems since 2021. He created CausalPy, an open-source Python library for causal inference in quasi-experimental settings. He holds a PhD in Neuroscience from the University of Sussex (UK) and previously held a university faculty position for 15 years.