2025-12-09 –, Horace Mann
Why do male test takers consistently score about 30 points higher than female test takers on the mathematics section of the SAT? Does this reflect an actual difference in math ability, or is it an artifact of selection bias—if young men with low math ability are less likely to take the test than young women with the same ability?
This talk presents a Bayesian model that estimates how much of the observed difference can be explained by selection effects. We’ll walk through a complete Bayesian workflow, including prior elicitation with PreliZ, model building in PyMC, and validation with ArviZ, showing how Bayesian methods disentangle latent traits from observed outcomes and separate the signal from the noise.
No prior knowledge of Bayesian statistics is required; attendees should be familiar with Python and common probability distributions.
Overview
This talk uses the SAT math gap as a case study to demonstrate modern Bayesian modeling in practice. For decades, male test takers have outperformed female test takers on the SAT math section by about 30 points. This outcome could reflect an actual difference in ability, or it could be explained by selection bias, if boys with weaker math skills are less likely to take the SAT than girls with comparable skills.
I present a generative Bayesian model that explicitly incorporates this selection mechanism and estimates the fraction of the observed gap attributable to bias. The talk emphasizes workflow over theory: how to build, validate, and interpret Bayesian models using PyMC, ArviZ, and PreliZ.
Audience
The target audience includes data scientists, applied researchers, and engineers who:
* Use Python for data analysis,
* Have basic familiarity with probability distributions,
* Are curious about Bayesian modeling but do not necessarily have prior experience with PyMC or Bayesian statistics.
Learning goals
Attendees will learn:
How to frame a substantive question as a Bayesian generative model,
How to use PreliZ for prior elicitation, PyMC for model building, and ArviZ for diagnostics and posterior predictive checks,
How to interpret results in terms of latent traits vs. observed outcomes,
How Bayesian models can provide a principled way to reason about confounding and bias.
Outline (approx. 30–40 minutes)
Introduction & background (5 min)
– The SAT math gap and the debate over its causes
– Why Bayesian inference is a good fit for this problem
Model construction (10 min)
– Latent efficacy distribution
– Selection mechanism (logistic link)
– Noise modeling for score perturbations
Workflow demonstration (15 min)
– Prior elicitation with PreliZ
– Sampling and diagnostics with PyMC and ArviZ
– Posterior predictive checks
Results & interpretation (5–7 min)
– Estimated contribution of selection bias to the observed gap
– Broader implications for educational testing and applied modeling
Takeaways (3–5 min)
– Lessons about Bayesian workflow
– Relevance to real-world problems of bias and confounding
Materials
All code and data preprocessing will be available in a public GitHub repository so attendees can reproduce the analysis and adapt it to their own work.
Allen Downey is a professor emeritus at Olin College and Principal Data Scientist at PyMC Labs. He is the author of several books -- including Think Python, Think Bayes, and Probably Overthinking It -- and a blog about programming and data science. He is a consultant and instructor specializing in Bayesian statistics. He received a Ph.D. in computer science from the University of California, Berkeley, and Bachelor's and Master's degrees from MIT.