DataFrame-agnostic code: are we there yet?
2023-08-16 , Aula

Have you ever wanted to write a DataFrame-agnostic function, which should perform the same operation regardless of whether the input is pandas / polars / something else? Did you get stuck with special-casing to handle all the different APIs? All is good, the DataFrame Standard is here to help!


If you want to write a DataFrame-agnostic function, you currently have three choices:
- convert the input DataFrame to pandas (say), perform operations, then convert
back to the original DataFrame library;
- write the same code multiple times, with if-then statements to deal with the differences between APIs;
- give up, and only support a single DataFrame (usually pandas).

However, there's a new solution in town: use the DataFrame Standard. The DataFrame Standard provides you with a minimal, strict, and predictable API. It allows you to develop with confidence, knowing that your code will work regardless of whether the caller uses pandas, polars, or some other DataFrame library.

Talk outline will be (roughly):
5 mins: motivation - why do we even need this?
5 mins: demo - let's write a DataFrame-agnostic function!
5 mins: stability, usability, future plans


Public link to supporting material:

https://data-apis.org/dataframe-api/draft/index.html

Project Homepage / Git:

https://github.com/data-apis/dataframe-api

Abstract as a tweet:

Learn how to write your code in such a way that it will support pandas, polars, and more - all without special-casing or data conversions!

Category [Data Science and Visualization]:

Data Analysis and Data Engineering

Expected audience expertise: Domain:

expert

Expected audience expertise: Python:

expert

Marco works as a Senior Software Engineer at Quansight Labs. He mainly works on pandas and the DataFrame Consortium (as part of work) and on polars (as a volunteer).