EuroSciPy 2024

Dispatching, Backend Selection, and Compatibility APIs
2024-08-29 , Room 5

Scientific python libraries struggle with the existence of several array and dataframe providers. Many important libraries currently mainly support NumPy arrays or pandas dataframes.
However, as library authors we wish to allow users to smoothly use other array provides and simplify for example the use of GPUs without the need for explicit use of cuda enabled libraries.

This session will be split into three related discussions around efforts to tackle this situation:
* Dispatching and backend selection discussion
* Array API adoption progress and discussion
* Dataframe compatibility layer discussion


Dispatching and Backend Selection Discussion

In the first part, we would like to briefly review the successful NetworkX backend selection and work towards a possible future dispatching project under the Scientific Python umbrella, spatch.

Many projects implement multiple dispatching based on types. Other projects have experimented with backend selection that goes beyond type dispatching and allows swapping in a different algorithm.
NetworkX has both and also added features such as including installed backends into its documentation.
And an older experiment for a dispatching system was uarray.
Projects such as scikit-learn currently focus on a hybrid approach: dispatching via the array API where possible, otherwise using a backend selection system.

Dispatching and backend selection is a complex field with various possible implementations. We would like to discuss requirements, specifically with a project like scikit-image in mind, and discuss how these requirements can be achieved.
This session is meant to be an open discussion to push forward a new spatch library: an example implementation of such a dispatching system.

Array API adoption progress and discussion

In the second part, we will discuss adoption of Array API into libraries such as SciPy, and scikit-learn. How did the support develop in the past year and what issues remain.

Dataframe compatibility layers

Finally, similarly to the Array API, solutions such as narwhals provide a compatibility layer for libraries working with dataframes.


Public link to supporting material:

https://networkx.org/documentation/latest/reference/backends.html

Project Homepage / Git:

https://github.com/scientific-python/spatch/

Abstract as a tweet:

Dispatching and Backend Selection Discussion: How to level up from NetworkX to scikit-image and "spatch"?

Category [High Performance Computing]:

Other

Expected audience expertise: Domain:

expert

Expected audience expertise: Python:

expert

I'm an open source software engineer at :probabl. I'm a core developer of scikit-learn and `imbalanced-learn.

This speaker also appears in:

I am a core contributor to Pandas and Apache Arrow, and maintainer of GeoPandas. I did a PhD at Ghent University and VITO in air quality research and worked at the Paris-Saclay Center for Data Science. Currently, I work at Voltron Data, contributing to Apache Arrow, and am a freelance teacher of python (pandas).

I am a scikit-learn core maintainer and work at NVIDIA.

Before working on scikit-learn I helped build mybinder.org and worked on JupyterHub.

Many years ago I was a particle physicist at CERN in Geneva.

This speaker also appears in:

Erik Welch is a senior system software engineer on the RAPIDS cuGraph team at NVIDIA and a core NetworkX developer. He has 20 years' experience using Python as a scientist, engineer, and open-source developer on a wide range of data and high-performance computing problems. He primarily works on nx-cugraph, an accelerated backend to NetworkX, and is the primary maintainer of the popular toolz library.

This speaker also appears in:

Marco is a core dev of pandas and Polars and works at Quansight Labs as Senior Software Engineer. He also consults and trains clients professionally on Polars. He has also written the first Polars Plugins Tutorial and has taught Polars Plugins to clients.

He has a background in Mathematics and holds an MSc from the University of Oxford, and was one of the prize winners in the M6 Forecasting Competition (2nd place overall Q1).

This speaker also appears in:

Sebastian has been a NumPy developer for about 10 years now. After a PhD in phsyics he worked at as a postdoc at the Berkeley Institute for Datascience on NumPy as grants byt the Alfred P. Sloan Foundation and the Gordon and Betty Moore Foundation. Since 2022 he has been a software engineer at NVIDIA where he continues to contribute to NumPy.

This speaker also appears in:

Hi, I'm Aditi. I am currently part of NetworkX's Core Developer team. NetworkX is a Python library used for graph analysis. I've been working mainly on NetworkX's dispatching side and the nx-parallel backend, previously as an Independent contractor and currently as a GSoC contributor. Also, I presented my work on nx-parallel as a poster at SciPy Con this year. And, I am currently pursuing a bachelor's in Data Science and Application from Indian Institute of Technology, Madras and another bachelor's in Computer Science from Delhi University.

This speaker also appears in:

I work at the intersection of computation and research, with a focus on improving open source tooling and supporting the community of developers. I've been involved with scientific Python since the early 2000s, and founded scikit-image in 2009. I am a co-author of Elegant SciPy, a community leader in the Scientific Python project (https://scientific-python.org), and thoroughly enjoy my collaborations with members of this community. I am originally from South Africa, and was privileged to attend my first EuroSciPy (which I loved!) in 2011.

This speaker also appears in: