PyCon DE & PyData 2026

To nest, or not to nest? Nested data types in Polars with big data
, Helium [3rd Floor]

Do you find yourself weighing up the pros and cons of using nested types in the Polars library - pondering whether you should encode your variables in structures using lists, arrays or opt for a flat format without complex hierarchy? This talk focuses on the crucial design choices available, the performance implications, and how this impacts the logic of your queries, as well as code readability, when deciding how to implement your big data pipeline in Polars. The methods available for nested types in Polars have seen some significant additions over the last year, with powerful functionality, such as filtering and aggregation, released in the latest versions of the library. These provide much-needed shortcuts for queries interrogating complex nested structures that previously required sophisticated user-defined functions. It makes the use of nested types much easier and intuitive, but does this mean you should nest your data? Through practical examples you’ll learn some guidelines to help you decide.


If you’ve ever designed or used SQL databases in your data science projects perhaps you’ve cringed at the lack of relational structure and data duplication in the design of big data storage and processing. On the other hand, if you’ve spent any considerable time getting dirty with Polars’ vectorized and columnar processing, you’ll also know that this can be somewhat of a moot point. So why bother?

Outline of the talk:

5 minutes: Introduction & origin story. What are Polars nested types? How do they work? Why do they matter?
5 minutes: Back to the future. Advanced queries on nested types, past & present.
5 minutes: Query structure - “Group by” forever baby, versus element-wise.
5 minutes: Storage comparison and the gigabyte scrooge - how a miser decides on a nested Polars structure.
5 minutes: Time is money – How performance stacks up.
5 minutes: Q&A

By the end of the talk, participants will have seen several straightforward examples, as well more advanced illustrations of nested structures in Polars using real-world data. They will be able to identify some key considerations informing their use of nested structures, including query logic, storage and performance.


Expected audience expertise in your talk's domain:: Intermediate Expected audience expertise in Python:: Intermediate
See also:

Daniel Finnan is a 2nd year PhD candidate at the Lirsa laboratory, Conservatoire national des arts et métiers (CNAM), in Paris. His thesis focuses on decentralized finance, specifically decentralized exchanges, applying a quantitative methodology using blockchain data, techniques in data science, and time series econometrics. He codes in Python, R, and occasionally Rust and JavaScript, specifically using Python to manage data pipelines. He has a professional certification in full-stack development and holds a Master’s degree in Economics, with a specialization in Economic, Digital and Data strategies from CNAM’s department of Economics, Finance, Insurance and Banking.