PyCon DE & PyData 2025

Daniel Elsner

I am currently a software engineer at QuantCo. Previously, I worked as a researcher in program analysis and software testing at the Technical University of Munich.


Session

04-24
14:20
30min
Dataframely — A declarative, 🐻‍❄️-native data frame validation library
Daniel Elsner, Oliver Borchert

Understanding the structure and content of data frames is crucial when working with tabular data — a core requirement for the robust pipelines we build at QuantCo.

Libraries such as pandera or patito already exist to ease the process of defining data frame schemas and validating that data frames comply with these schemas. However, when building production-ready data pipelines, we encountered limitations of these libraries. Specifically, we were missing support for strict static type checking, validation of interdependent data frames, and graceful validation including introspection of failures.

To remedy the shortcomings of these libraries, we started building dataframely at the beginning of last year. Dataframely is a declarative data frame validation library with first-class support for polars data frames.

Over the last year, we have gained experience in using dataframely both for analytical and production code across several projects. The result was a drastic improvement of the legibility of our pipeline code and our confidence in its correctness. To enable the wider data engineering community to benefit from similar effects, we have recently open-sourced dataframely and are keen on introducing it in this talk.

PyData: Data Handling & Engineering
Platinum3