PyData London 2026

Did Your Rollout Actually Work? Measuring Phased Launches with Staggered DiD in Python
2026-06-06 , Hardwick Hub

Your company launches a loyalty program — but not everywhere at once. Ten stores get it in January, another ten in March, the rest later. Leadership asks: "Did it work? By how much?" You compare before and after... and get a number that's wrong. Phased rollouts break naive pre/post comparisons, and standard regression quietly gives misleading answers.

This talk shows a practical Python workflow for getting it right. Using a realistic store-rollout example and CausalPy (an open-source library), I'll demonstrate how to produce event-study plots that show when and how much an intervention takes effect — with uncertainty estimates your stakeholders can actually act on. Whether you're measuring feature flags, marketing campaigns, or policy changes, you'll leave with a reproducible notebook and a step-by-step workflow you can apply tomorrow.


Who this is for

Data scientists, analysts, and applied ML/measurement practitioners who evaluate interventions using observational or quasi-experimental data (e.g., feature flags, phased launches, regional changes). Familiarity with pandas and basic regression is helpful; no prior Bayesian experience required.

What attendees will learn (takeaways)

  • How staggered adoption differs from "textbook" two-period Difference-in-Differences, and why the difference matters in production measurement.
  • How the imputation-based estimator (Borusyak, Jaravel & Spiess, 2024) works: fit on untreated observations, predict counterfactuals, aggregate by event time.
  • How to turn model output into stakeholder-friendly language: probability of positive effect, expected uplift, decision thresholds — no Bayesian background needed.
  • The parameter recovery pattern: validate your method on simulated data with known truth before trusting it on real data.
  • Practical diagnostics and red flags: parallel trends, anticipation effects, spillovers, and when not to use this method.

Outline and time plan (30 min talk + 10 min Q&A)

  • 0–4 min: The real-world problem — phased rollouts and why naive pre/post comparisons fail
  • 4–10 min: DiD refresher, then what breaks under staggered adoption (timing heterogeneity, negative weighting in TWFE)
  • 10–17 min: The staggered DiD solution (event-time framing, imputation intuition, key assumptions)
  • 17–25 min: Worked example in Python with CausalPy
  • A loyalty program rolled out to 60 stores in 3 waves over 30 weeks
  • Visualise adoption timing and check pre-trends
  • Fit the model and produce event-study plots
  • Parameter recovery: compare estimated effects to known ground truth
  • 25–28 min: Diagnostics — pre-treatment placebo checks, counterfactual inspection, "when not to use this" decision checklist
  • 28–30 min: Summary — three takeaways and the six-step workflow
  • 30–40 min: Q&A

Background knowledge needed

  • Comfortable with tidy data, grouping/aggregating, and reading a regression coefficient.
  • Basic causal inference vocabulary (treatment/control, confounding) is helpful but not required.

What I will provide

A public GitHub repository containing:

  • a reproducible Quarto notebook (the slides themselves, with all code),
  • a synthetic dataset simulating a realistic store loyalty program rollout,
  • and environment setup instructions (conda environment file).

Ben Vincent is Director of InferenceWorks Ltd and a Principal Data Scientist at PyMC Labs, where he has been building Bayesian solutions for real-world business problems since 2021. He created CausalPy, an open-source Python library for causal inference in quasi-experimental settings. He holds a PhD in Neuroscience from the University of Sussex (UK) and previously held a university faculty position for 15 years.