PyCon Lithuania 2024

DataFrame interoperatiblity - what's been achieved, and what comes next?
2024-04-05 , Room 111

In 2023, we saw several libraries - which had previously only supported pandas - add support for other dataframe libraries such as Polars, Modin, and cuDF.

  • How did they do it?
  • Are there any drawbacks to how they did it?
  • What comes next, and what other solutions are there?

This talk could be of interest to anyone working with dataframes. In particular, those maintaining or contributing to libraries which use dataframes will learn about how they can best support multiple dataframe libraries.


In 2023, we saw several libraries - which had previously only supported pandas - add support for other dataframe libraries, with a particular emphasis on Polars. They typically did this in one of three ways:
1) convert to pandas (either with to_pandas or via the Dataframe Interchange Protocol)
2) convert to something else (e.g. NumPy or PyArrow)
3) write parallel pandas/Polars logic

These all represented a quality-of-life improvement for Polars users. But were there drawbacks?
- Solution 1) requires users to still have pandas as a dependency, and doesn't overcome pandas' limitations.
- Solution 2) is only workable if a library doesn't do any dataframe operations to begin with.
- Solution 3) comes with a heavy maintenance load, and requires the library to have to add a level third parallel logic if they want to support yet another dataframe library.

So what can be done instead? A solution I'll present is to use Narwhals, which guarantees that your code will work the same way across dataframe libraries - even ones that don't exist yet - all using familiar Polars syntax.

The format will roughly be:
- 2-3 mins: an overview of the dataframe landscape
- 2-3 mins: what happened in 2023, which libraries started supporting Polars instead of just pandas
- 5 mins: what's the interchange protocol? What are the downsides of using it? Why couldn't we just have standardised to_pandas?
- 5 mins: what's Narwhals?
- 7-8 mins: how can you use Narwhals to support multiple dataframes? How can you get Narwhals to support your dataframe library?
- 2-3 mins: what comes next

By the end of the talk, attendees will have learned about the dataframe ecosystem, and those involved with dataframe-consuming libraries will know all they need in order to effectively support multiple dataframe libraries.
Library maintainers and contributors will get the most out of the talk, but anyone regularly using dataframes will also learn a lot and the tools they use.

Marco is a core dev of pandas and Polars and works at Quansight Labs as Senior Software Engineer.
He also consults and trains clients professionally on Polars. He has also written the first Polars Plugins Tutorial and has taught Polars Plugins to clients.

He has a background in Mathematics and holds an MSc from the University of Oxford, and was one of the prize winners in the M6 Forecasting Competition (2nd place overall Q1).

This speaker also appears in: