EuroSciPy 2026

From prototype to production: scaling data science projects naturally in research and industry
2026-07-20 , Room 2.41 (First Floor, Turing)

Scaling data science pipelines in research and industry poses well-known maintainability challenges. Research codebases must support rapid iteration as insights evolve, while industry systems must scale amid changing business needs and organizational complexity. Effective projects should remain maintainable without sacrificing time-to-market. Ideally, evolving from a notebook experiment to a production-grade application should feel natural, with minimal overhead.

In this talk, we present our journey developing Ordeq, an open-source Python library for building maintainable data pipelines used by data scientists, analysts, and engineers at ING. Ordeq bridges exploratory research and production systems without forcing practitioners to abandon familiar workflows.

We show how data science projects benefit from established software engineering principles, particularly those inspired by functional programming. By embedding composition, side-effect isolation, and separation of concerns from the outset, teams can significantly improve reproducibility, testability, and refactorability without introducing heavy framework complexity.

Attendees will leave with practical design principles for structuring data projects that scale naturally from prototype to production, regardless of whether they adopt Ordeq itself.


Scaling data science pipelines in research and industry poses well-known maintainability challenges. Research codebases must support rapid iterations as new insights and ideas emerge.
Industry projects, meanwhile, need to scale amid ever-changing business needs and organizational complexity. Effective data science projects should remain maintainable without sacrificing time-to-market. Ideally, evolving from a notebook experiment to a production-grade application should feel natural, with minimal overhead.

In this talk we present our journey developing Ordeq, an open-source Python library for building maintainable data pipelines actively used by data scientists, analysts, and engineers at ING.
This framework was designed to bridge the gap between exploratory research and production systems, without forcing data scientists to abandon familiar workflows. It is currently used in production applications, as well as during exploratory research and experiments.

We discuss how data science projects can benefit from established design patterns in software engineering, specifically from functional programming. We demonstrate how composition, side-effect isolation and separation of concerns can be embedded into projects from their outset, without adding heavy framework complexity. Guided by real-world use cases we argue that by adopting these principles early, projects can significantly improve reproducibility, testability, and refactorability.

Finally, we demonstrate how the framework integrates with popular tools in the Python ecosystem, such as Polars, Spark and Matplotlib, and discuss how its design philosophy compares to tools like Airflow and dlt. Attendees will leave with practical design principles for structuring data projects that scale naturally, from prototype to production, regardless of whether they adopt Ordeq itself.


Expected audience expertise: Domain: some Expected audience expertise: Python: some Supporting material: Supporting material Project homepage or Git: Project homepage or Git Your relationship with the presented work/project: Original author or co-author, Active contributor, Developed the presented feature, Maintainer of the presented library/project, Developed original workshop or study course

Niels is a software engineer, currently working for a treasury startup in Amsterdam. Simon is a data scientist and engineer, currently working as tech lead at ING Bank. Both have experience at the intersection of software engineering and data science within the fintech domain.