Pure functions + Separate I/O: Functional Python Pipelines for Reproducible Experiments EuroSciPy 2026

Pure functions + Separate I/O: Functional Python Pipelines for Reproducible Experiments
.ical
2026-07-20 15:20–15:50, Room 2.41 (First Floor, Turing)

Scaling data science pipelines in research and industry poses well-known maintainability challenges (big ball of mud). Research codebases must support rapid iteration as insights evolve, while industry systems must scale amid changing business needs and organizational complexity. Effective projects should remain maintainable without overhauling the entire code base for each change. Ideally, evolving from a notebook experiment to a production-grade application should feel natural, with minimal overhead.

In this talk, we show how data science projects benefit from established software engineering principles, particularly those inspired by functional programming, in Python. The first part of the talk outlines the design principles. The second part, will go into our (brutally honest) insights from applying these in various research projects, spanning from master student experiments to the applications in our R&D teams.

Scaling data science pipelines in research and industry poses well-known maintainability challenges. Research codebases must support rapid iterations as new insights and ideas emerge. Industry projects, meanwhile, need to scale amid ever-changing business needs and organizational complexity. Effective data science projects should remain maintainable without overhauling the entire code base for each change. Ideally, evolving from a notebook experiment to a production-grade application should feel natural, with minimal overhead.

We discuss how data science projects can benefit from established design patterns in software engineering, specifically from functional programming. We demonstrate how composition, side-effect isolation and separation of concerns can be embedded into projects from their outset, without adding heavy framework complexity. Guided by real-world use cases we argue that by adopting these principles early, projects can significantly improve reproducibility, testability, and refactorability.

Our analysis builds on Ordeq, an open-source Python library for building maintainable data pipelines actively used by data scientists, analysts, and engineers at ING. This framework was designed to bridge the gap between exploratory research and production systems, without forcing data scientists to abandon familiar workflows. It is currently used in production applications, as well as during exploratory research and experiments. In this talk we will not go in-depth into the framework - which we did at PyData Amsterdam 2025. If you're interested in learning more, the you can have a look at the code and documentation.

Attendees will leave with practical design principles for structuring data projects that scale naturally, from prototype to production, regardless of whether they adopt Ordeq itself.

Ordeq GitHub: https://github.com/ing-bank/ordeq
Ordeq documentation: https://ing-bank.github.io/ordeq/

Expected audience expertise: Domain: some Expected audience expertise: Python: some Supporting material: Supporting material Project homepage or Git: Project homepage or Git Your relationship with the presented work/project: Original author or co-author, Active contributor, Developed the presented feature, Maintainer of the presented library/project, Developed original workshop or study course

Niels Neerhoff

Niels is a software engineer at Palm, the AI treasury startup. Simon is a data scientist and engineer, currently working as tech lead at ING Bank. Both have experience at the intersection of software engineering and data science within the fintech domain.

Simon Brugman

Simon has previously presented at SciPy 2022 (popmon) and EuroSciPy 2024 (pycodehash). This year, we'll have a meta talk on "Pure Functions + Separate I/O: Functional Python Pipelines for Reproducible Experiments". Rather than going into the details of the framework itself, which we did at PyData Amsterdam 2025 (ordeq), we will provide our brutally honest learnings from applying this design pattern to actual research and development.

Simon has actively developed various open-source projects, such as pandas-profiling, and been an outside contributor to ruff and uv.

popmon: https://proceedings.scipy.org/articles/majora-212e5952-01d
pycodehash: https://pycodehash.github.io/pycodehash/
Ordeq at PyData Amsterdam: https://cfp.pydata.org/pydata-amsterdam-2025/talk/9WEFB3/
Ordeq code: https://github.com/ing-bank/ordeq
pandas-profiling: https://github.com/Data-Centric-AI-Community/fg-data-profiling

Pure functions + Separate I/O: Functional Python Pipelines for Reproducible Experiments .ical 2026-07-20 15:20–15:50, Room 2.41 (First Floor, Turing)

Pure functions + Separate I/O: Functional Python Pipelines for Reproducible Experiments
.ical
2026-07-20 15:20–15:50, Room 2.41 (First Floor, Turing)