Chainsail: facilitating sampling of multimodal probability distributions
2024-09-26 , Louis Armand 1 - Est

Markov chain Monte Carlo (MCMC) methods, a class of iterative algorithms that allow sampling almost arbitrary probability distributions, have become increasingly popular and accessible to statisticians and scientists. But they run into difficulties when applied to multimodal probability distributions. These occur, for example, in Bayesian data analysis, when multiple regions in the parameter space explain the data equally well or when some parameters are redundant. Inaccurate sampling then results in incomplete and misleading parameter estimates.
Markov chain Monte Carlo (MCMC) methods, a very popular class of iterative algorithms that allow sampling almost arbitrary probability distributions, run into difficulties when applied to multimodal probability distributions. These occur, for example, in Bayesian data analysis, when multiple regions in the parameter space explain the data equally well or when some parameters are redundant.
In this talk, intended for data scientists and statisticians with basic knowledge of MCMC and probabilistic programming, I present Chainsail, an open-source web service written entirely in Python. It implements Replica Exchange, an advanced MCMC method designed specifically to improve sampling of multimodal distributions.
Chainsail makes this algorithm easily accessible to users of probabilistic programming libraries by automatically tuning important parameters and exploiting easy on-demand provisioning of the (increased) computing resources necessary for running Replica Exchange.


The presentation starts with a small refresher on MCMC and an introduction to the Replica Exchange algorithm. A second part will focus on the web service architecture, and finally, I will show examples of how Chainsail improves MCMC sampling.
The audience will learn about a common pitfall when doing Bayesian data analysis and how an algorithm originally developed in the field of solid state physics can help. Furthermore, the architecture part of the presentation shows an example of how to set up a complex, multi-component web service using Kubernetes and other popular software.

I'm a data / software engineer at Modus Create, working on projects in GenAI and biotech domains. I have a background in physics and computational structural biology.