From prototype to production: scaling data science projects naturally in research and industry
Scaling data science pipelines in research and industry poses well-known maintainability challenges. Research codebases must support rapid iteration as insights evolve, while industry systems must scale amid changing business needs and organizational complexity. Effective projects should remain maintainable without sacrificing time-to-market. Ideally, evolving from a notebook experiment to a production-grade application should feel natural, with minimal overhead.
In this talk, we present our journey developing Ordeq, an open-source Python library for building maintainable data pipelines used by data scientists, analysts, and engineers at ING. Ordeq bridges exploratory research and production systems without forcing practitioners to abandon familiar workflows.
We show how data science projects benefit from established software engineering principles, particularly those inspired by functional programming. By embedding composition, side-effect isolation, and separation of concerns from the outset, teams can significantly improve reproducibility, testability, and refactorability without introducing heavy framework complexity.
Attendees will leave with practical design principles for structuring data projects that scale naturally from prototype to production, regardless of whether they adopt Ordeq itself.
Computational Tools and Scientific Python Infrastructure
Room 2.41 (First Floor, Turing)