Did Your Rollout Actually Work? Measuring Phased Launches with Staggered DiD in Python PyData London 2026

Did Your Rollout Actually Work? Measuring Phased Launches with Staggered DiD in Python
.ical
2026-06-06 14:45–15:30, Hardwick Hub

Your company launches a loyalty program — but not everywhere at once. Ten stores get it in January, another ten in March, the rest later. Leadership asks: "Did it work? By how much?" You compare before and after... and get a number that's wrong. Phased rollouts break naive pre/post comparisons, and standard regression quietly gives misleading answers.

This talk shows a practical Python workflow for getting it right. Using a realistic store-rollout example and CausalPy (an open-source library), I'll demonstrate how to produce event-study plots that show when and how much an intervention takes effect — with uncertainty estimates your stakeholders can actually act on. Whether you're measuring feature flags, marketing campaigns, or policy changes, you'll leave with a reproducible notebook and a step-by-step workflow you can apply tomorrow.

Who this is for

Data scientists, analysts, and applied ML/measurement practitioners who evaluate interventions using observational or quasi-experimental data (e.g., feature flags, phased launches, regional changes). Familiarity with pandas and basic regression is helpful; no prior Bayesian experience required.

What attendees will learn (takeaways)

How staggered adoption differs from "textbook" two-period Difference-in-Differences, and why the difference matters in production measurement.
How the imputation-based estimator (Borusyak, Jaravel & Spiess, 2024) works: fit on untreated observations, predict counterfactuals, aggregate by event time.
How to turn model output into stakeholder-friendly language: probability of positive effect, expected uplift, decision thresholds — no Bayesian background needed.
The parameter recovery pattern: validate your method on simulated data with known truth before trusting it on real data.
Practical diagnostics and red flags: parallel trends, anticipation effects, spillovers, and when not to use this method.

Outline and time plan (30 min talk + 10 min Q&A)

0–4 min: The real-world problem — phased rollouts and why naive pre/post comparisons fail
4–10 min: DiD refresher, then what breaks under staggered adoption (timing heterogeneity, negative weighting in TWFE)
10–17 min: The staggered DiD solution (event-time framing, imputation intuition, key assumptions)
17–25 min: Worked example in Python with CausalPy
A loyalty program rolled out to 60 stores in 3 waves over 30 weeks
Visualise adoption timing and check pre-trends
Fit the model and produce event-study plots
Parameter recovery: compare estimated effects to known ground truth
25–28 min: Diagnostics — pre-treatment placebo checks, counterfactual inspection, "when not to use this" decision checklist
28–30 min: Summary — three takeaways and the six-step workflow
30–40 min: Q&A

Background knowledge needed

Comfortable with tidy data, grouping/aggregating, and reading a regression coefficient.
Basic causal inference vocabulary (treatment/control, confounding) is helpful but not required.

What I will provide

A public GitHub repository containing:

a reproducible Quarto notebook (the slides themselves, with all code),
a synthetic dataset simulating a realistic store loyalty program rollout,
and environment setup instructions (conda environment file).

Benjamin Vincent

Ben Vincent is Director of InferenceWorks Ltd and a Principal Data Scientist at PyMC Labs, where he has been building Bayesian solutions for real-world business problems since 2021. He created CausalPy, an open-source Python library for causal inference in quasi-experimental settings. He holds a PhD in Neuroscience from the University of Sussex (UK) and previously held a university faculty position for 15 years.