2023-08-16 –, Aula
Have you ever wanted to write a DataFrame-agnostic function, which should perform the same operation regardless of whether the input is pandas / polars / something else? Did you get stuck with special-casing to handle all the different APIs? All is good, the DataFrame Standard is here to help!
If you want to write a DataFrame-agnostic function, you currently have three choices:
- convert the input DataFrame to pandas (say), perform operations, then convert
back to the original DataFrame library;
- write the same code multiple times, with if-then statements to deal with the differences between APIs;
- give up, and only support a single DataFrame (usually pandas).
However, there's a new solution in town: use the DataFrame Standard. The DataFrame Standard provides you with a minimal, strict, and predictable API. It allows you to develop with confidence, knowing that your code will work regardless of whether the caller uses pandas, polars, or some other DataFrame library.
Talk outline will be (roughly):
5 mins: motivation - why do we even need this?
5 mins: demo - let's write a DataFrame-agnostic function!
5 mins: stability, usability, future plans
Learn how to write your code in such a way that it will support pandas, polars, and more - all without special-casing or data conversions!
Category [Data Science and Visualization]:Data Analysis and Data Engineering
Expected audience expertise: Domain:expert
Expected audience expertise: Python:expert
Marco works as a Senior Software Engineer at Quansight Labs. He mainly works on pandas and the DataFrame Consortium (as part of work) and on polars (as a volunteer).