PyCon DE & PyData 2026

Oliver Borchert

For the past 4 years, I have been working on machine learning and data engineering and QuantCo. Previously, I studied computer science at the Technical University of Munich, focusing on machine and deep learning.


Session

04-16
10:15
90min
Building reliable data pipelines with polars and dataframely
Oliver Borchert, Andreas Albert

If you have worked with real-world data before, you know that processing it can be challenging. Data often comes scattered across tables, in inconsistent encodings, with duplicated rows and is generally dirty. In this tutorial, you will learn how to process large amounts of data reliably and quickly using polars and dataframely.

What we love about polars is that it's easy to use, fast and elegant — it allows us to build and compose complex transformations with ease. On this basis, we built dataframely: a library for defining and validating contents of polars data frames. With dataframely, we can build pipelines without ever getting confused about what's in our data frames. We document and validate our expectations and assumptions clearly, which makes our pipeline code simpler and easier to understand. "Is this join correct?", and "where did this column come from?" are questions you will not have to worry about anymore.

In this tutorial, you will become familiar with polars basics by writing a simple pipeline: you will read data, transform it to make it ready for use, and you will learn how to do that fast. With dataframely schemas, you will upgrade your code from "it works" to "it's beautiful!", and along the way, dataframely will help you eliminate entire classes of bugs you will never have to think about again. After the tutorial, you will be all set to use these tools in your own work.

PyData: Data Handling & Data Engineering
Ferrum [2nd Floor]