PyCon DE & PyData 2025

Dataframely — A declarative, 🐻‍❄️-native data frame validation library
2025-04-24 , Platinum3

Understanding the structure and content of data frames is crucial when working with tabular data — a core requirement for the robust pipelines we build at QuantCo.

Libraries such as pandera or patito already exist to ease the process of defining data frame schemas and validating that data frames comply with these schemas. However, when building production-ready data pipelines, we encountered limitations of these libraries. Specifically, we were missing support for strict static type checking, validation of interdependent data frames, and graceful validation including introspection of failures.

To remedy the shortcomings of these libraries, we started building dataframely at the beginning of last year. Dataframely is a declarative data frame validation library with first-class support for polars data frames.

Over the last year, we have gained experience in using dataframely both for analytical and production code across several projects. The result was a drastic improvement of the legibility of our pipeline code and our confidence in its correctness. To enable the wider data engineering community to benefit from similar effects, we have recently open-sourced dataframely and are keen on introducing it in this talk.


In this talk, we will talk about the motivation behind building dataframely in more detail and lead the audience through its key features. We will also touch upon our learnings in developing robust data pipelines that establish clear contracts for the design of data transformations. In our experience, this significantly improves communication among developers and comprehensibility of the entire pipeline.


Expected audience expertise: Domain:

Intermediate

Expected audience expertise: Python:

Intermediate

I am currently a software engineer at QuantCo. Previously, I worked as a researcher in program analysis and software testing at the Technical University of Munich.

For the past 3 years, I have been working on machine learning and data engineering and QuantCo. Previously, I studied computer science at the Technical University of Munich, focusing on machine and deep learning.