2026-06-07 –, Doddington Forum
Polars is a dataframe library which has taken the world by storm over the last 4-5 years. Because people love benchmarks, people often compare it with SQL-like engines such as DuckDB, PySpark, Daft, and others. But what if, instead of comparing performance, we compared semantics?
This talk will make no mention whatsoever of performance differences. Instead, it will focus entirely on the semantic differences - which don't get nearly enough attention - of Polars vs SQL. Attendees will leave with a heightened appreciation for the differences between the Polars and SQL models, and an understanding of the consequences this has on their code.
Polars is a dataframe library that started gaining significant traction in the data science community around 2022/2023. It is now generally regarded as a safer and more performant alternative to its extremely popular counterpart pandas. As such, it has attracted several performance comparisons with SQL-like engines such as DuckDB, PySpark, Daft, and more. What's typically missing from these comparisons is an explanation of the semantic differences.
For example:
- Why does Polars let me do pl.col('price') - pl.col('price').mean(), but SQL doesn't?
- Why does Polars let me filter using window functions, and how can I get SQL to?
- Are there operations that are more dangerous in Polars than in SQL?
- How do they differ when working with time zones?
- Why did SQL reorder my rows when Polars didn't?
Outline of the talk:
- Motivation: why care about Polars or about SQL?
- Relational model background, row order
- Polars model, how it differs from the relational model, and what this means for you
- Abstracting the Polars and SQL differences away in Narwhals, and advice for non-Narwhals users
- Q&A
This is a technical but accessible talk aimed at data practitioners. Data engineers, data scientists, data analysts, and anyone else working with data will leave the talk with stronger theoretical foundations regarding the Polars and SQL data models. Most importantly, they will learn what this means for them, and what they can do about it.
Author of Narwhals, heavy contributor to pandas, Polars, and NumPy (stubs). Marco works as Senior Software Engineer at Quansight Labs. His background is in Mathematics. Outside of work he can most likely be spotted at Celtic Folk Sessions.