PyData London 2026

Model criticism through posterior predictive checks
2026-06-05 , Hardwick Hub

Posterior predictive checks are a key step within Bayesian modeling workflows where we compare model predictions with the data used to fit the model. By focusing on distributional comparisons instead of point estimates, they offer valuable insights about our models, where they fail and inform model improvements. Knowing a model is not completely right is relatively easy, knowing why that is the case and how to fix it are a whole other question which will be the focus of the tutorial. This tutorial will provide data scientists and researchers with multiple strategies for posterior predictive checks to allow their use in continuous, discrete or categorical data, and for homogeneous or heterogeneous data.


The main expected audience of this tutorial are practitioners, either in academia or industry, working with probabilistic models, and it will also include multiple elements of interest to anyone working with any kind of statistical model or fitting data through simulations.

The material for the tutorial will be published on GitHub beforehand so attendees can download the data and prepare their environments. The tutorial will assume attendees are familiar with Python, Jupyter notebooks and basic statistical concepts. Knowledge about Bayesian inference and posterior predictive sampling will be helpful but they are not required.

The tutorial will have an initial introductory section of ~40 minutes. The introduction will cover posterior predictive checks conceptually as well as usage examples using ArviZ. This will be followed by multiple hands-on exercises on provided example datasets to practice model criticism through posterior predictive checks. The main topics covered will be:

  • Understanding the need for distributional comparisons
  • Understanding how to adapt model criticism to the type of data
  • Diagnosing models of heterogeneous data at both the population and group level
  • Translating model criticism visualizations to model issues
  • Multiple uncertainty visualization designs
  • How to use ArviZ for predefined and custom posterior predictive checks

Oriol is a computational statistician, working as a maintainer of the ArviZ and PyMC libraries and as Principal Data Scientist with PyMC Labs. He started in academia but after some years but he left after some years in order to be able to work more freely and collaboratively on open source, software and knowledge sharing. His main areas of interest are data visualization, model and inference diagnostics, model comparison, and prior elicitation. Within open source projects, he has also dedicated a large part of his work to documentation, governance and DEI.

This speaker also appears in: