PyCon DE & PyData 2026

Building reliable data pipelines with polars and dataframely
, Ferrum [2nd Floor]

If you have worked with real-world data before, you know that processing it can be challenging. Data often comes scattered across tables, in inconsistent encodings, with duplicated rows and is generally dirty. In this tutorial, you will learn how to process large amounts of data reliably and quickly using polars and dataframely.

What we love about polars is that it's easy to use, fast and elegant — it allows us to build and compose complex transformations with ease. On this basis, we built dataframely: a library for defining and validating contents of polars data frames. With dataframely, we can build pipelines without ever getting confused about what's in our data frames. We document and validate our expectations and assumptions clearly, which makes our pipeline code simpler and easier to understand. "Is this join correct?", and "where did this column come from?" are questions you will not have to worry about anymore.

In this tutorial, you will become familiar with polars basics by writing a simple pipeline: you will read data, transform it to make it ready for use, and you will learn how to do that fast. With dataframely schemas, you will upgrade your code from "it works" to "it's beautiful!", and along the way, dataframely will help you eliminate entire classes of bugs you will never have to think about again. After the tutorial, you will be all set to use these tools in your own work.


In this tutorial, you will become familiar with polars basics by writing a simple pipeline: you will read data, transform it to make it ready for use, and you will learn how to do that fast. With dataframely schemas, you will upgrade your code from "it works" to "it's beautiful!", and along the way, dataframely will help you eliminate entire classes of bugs you will never have to think about again. After the tutorial, you will be all set to use these tools in your own work.


Expected audience expertise in your talk's domain:: Novice Expected audience expertise in Python:: Intermediate

For the past 4 years, I have been working on machine learning and data engineering and QuantCo. Previously, I studied computer science at the Technical University of Munich, focusing on machine and deep learning.

I am a software and data engineer working on data pipelines at QuantCo. In a previously life I looked for dark matter in particle collisions.