EuroSciPy 2026

setu: Bridging Simulators to Probabilistic Programming in JAX
2026-07-20 , Room 1.38 (Ground Floor, Turing)

Many scientific models, from climate systems to neural circuits, are defined as simulators: computer programs that generate data from parameters but provide no tractable likelihood function. This makes them invisible to probabilistic programming languages (PPLs) like PyMC and NumPyro, which require explicit likelihoods for Bayesian inference. Practitioners are forced to choose: make simplifying assumptions about the simulator to use a PPL, or use the real simulator and give up the rich modeling capabilities PPLs offer, such as prior specification, uncertainty quantification and exploitation of hierarchical structures.

We present setu ("bridge"), a JAX-native Python package that closes this gap. setu uses generative neural networks trained on simulated data to learn a neural surrogate of the likelihood. This learned likelihood can then be exported directly into PPLs via a simple API: nle.to_pymc() or nle.to_numpyro(). Once inside a PPL, the full Bayesian toolbox becomes available: hierarchical models, custom priors, posterior predictive checks, and standard MCMC samplers — all running on a simulator that was previously out of reach.

The package follows a clean simulate, train, validate, export workflow, with built-in diagnostics to ensure the learned likelihood is trustworthy before it ever enters a PPL. In this talk, we walk through the motivation, design, and a real-world example showing how a black-box simulator gains full PPL capabilities.


Background and motivation

Simulation-based inference (SBI) has emerged as a powerful set of methods for performing Bayesian inference with simulator models that lack tractable likelihoods. Packages like sbi (PyTorch) have made these methods accessible to Python users. However, a key limitation remains: SBI methods typically produce standalone posterior approximations, disconnected from the broader probabilistic programming ecosystem.

Last year at EuroSciPy 2025, we presented work on bridging SBI to pyro for hierarchical Bayesian inference: enabling flexible design of multi-level models for intractable simulators. While promising, this approach was tightly coupled to the sbi package's PyTorch ecosystem and Pyro's specific API, making it difficult for users of other PPLs (PyMC, NumPyro) to benefit.

This motivated a fundamental rethinking: rather than building bridges from within existing packages, we built setu, a standalone, JAX-native package purpose-built for one thing: learning neural likelihood (ratio) surrogates from simulators and exporting them to any PPL.

What setu does

  1. Simulate: Run your simulator to generate paired (parameter, data) samples. Usually happens on the user side, but setu provides utilities for parallelization.
  2. Train: Fit a normalizing flow (masked autoregressive flows or neural spline flows) to learn the conditional density p(data | parameters) — the likelihood.
  3. Validate: Before trusting the learned likelihood, run built-in diagnostic checks: classifier two-sample tests (C2ST), distribution shift detection, and training convergence monitoring.
  4. Export: Call .to_pymc() or .to_numpyro() to get a likelihood term you can drop into any PPL model. The learned log-probability integrates seamlessly with the PPL's inference engine.

Why JAX?

Building natively in JAX was a strategic choice:
- PyMC integration: PyMC's backend (PyTensor) has a direct JAX compilation path, making the bridge nearly zero-overhead.
- NumPyro: Already JAX-native, so the integration is trivial.
- Performance: JAX's JIT compilation and automatic differentiation provide significant speedups for both training and inference, especially in hierarchical models where the learned likelihood is evaluated many times.

Talk outline (25 minutes)

  1. The problem (5 min): Why most simulators cannot be used in PPLs, and why this matters for scientific inference. We introduce the running example: hierarchical modeling of tadpole survival across experimental tanks, a classic problem from ecology (Vonesh & Bolker, 2005), well known through the Statistical Rethinking lectures.
  2. The idea (5 min): Neural likelihood estimation in a nutshell; what normalizing flows learn, and why this enables the use of PPLs. We validate setu by showing its performance vs. PyMC on the fully tractable Binomial tadpole survival model.
  3. setu in practice (10 min): We extend the tadpole example to an individual-based mechanistic simulator with size-dependent predation and density-dependent competition (Vonesh & Bolker, 2005), which results in an intractable likelihood. Using setu, we train a neural likelihood, export it to PyMC, and perform hierarchical inference across all 48 tanks. This would be challenging with standard PPLs or standalone SBI.
  4. Validation matters (3 min): Why you must check your learned likelihood before trusting it, and how it is done in setu.
  5. Ecosystem and future (2 min): Current status, roadmap, and how to get involved.

Key takeaways for the audience

  • Neural density estimation can turn simulators into a likelihood function usable by PPLs.
  • The simulate, train, validate, export workflow makes this practical and safe.
  • setu is designed for scientists who already know PyMC or NumPyro and want to use their real simulators instead of simplified analytical models.
  • Validation is essential: approximate likelihoods need rigorous checking.

Relevance to EuroSciPy

This talk sits at the intersection of several EuroSciPy themes: scientific Python infrastructure, numerical simulation frameworks, and statistical/mathematical computing. It addresses a real and growing need across disciplines where scientists have sophisticated simulators but lack the statistical tools to perform proper Bayesian inference with them. The running example builds on the Reed Frogs dataset familiar to many from Statistical Rethinking, making the problem immediately accessible before we extend it beyond what textbook methods can handle. The package is open source, JAX-native, and integrates with the most widely-used PPLs in the Python ecosystem.


Expected audience expertise: Domain: none Expected audience expertise: Python: some Your relationship with the presented work/project: Original author or co-author

Jan initially immersed himself in the realms of cognitive science and computational neuroscience. However, he couldn’t resist the siren call of Bayesian machine learning, and his PhD evolved into a mission to enhance the user-friendliness of this complex field. He set out to bridge cutting-edge methods with user-friendly software, making the world of simulation-based inference more accessible for practitioners. In 2024, he joined the appliedAI Institute for Europe, ready to continue his journey of making advanced methodologies approachable and transformative.