2024-08-29 –, Room 5
Scientific python libraries struggle with the existence of several array and dataframe providers. Many important libraries currently mainly support NumPy arrays or pandas dataframes.
However, as library authors we wish to allow users to smoothly use other array provides and simplify for example the use of GPUs without the need for explicit use of cuda enabled libraries.
This session will be split into three related discussions around efforts to tackle this situation:
* Dispatching and backend selection discussion
* Array API adoption progress and discussion
* Dataframe compatibility layer discussion
Dispatching and Backend Selection Discussion
In the first part, we would like to briefly review the successful NetworkX
backend selection and work towards a possible future dispatching project under the Scientific Python umbrella, spatch
.
Many projects implement multiple dispatching based on types. Other projects have experimented with backend selection that goes beyond type dispatching and allows swapping in a different algorithm.
NetworkX
has both and also added features such as including installed backends into its documentation.
And an older experiment for a dispatching system was uarray
.
Projects such as scikit-learn currently focus on a hybrid approach: dispatching via the array API where possible, otherwise using a backend selection system.
Dispatching and backend selection is a complex field with various possible implementations. We would like to discuss requirements, specifically with a project like scikit-image
in mind, and discuss how these requirements can be achieved.
This session is meant to be an open discussion to push forward a new spatch
library: an example implementation of such a dispatching system.
Array API adoption progress and discussion
In the second part, we will discuss adoption of Array API into libraries such as SciPy, and scikit-learn. How did the support develop in the past year and what issues remain.
Dataframe compatibility layers
Finally, similarly to the Array API, solutions such as narwhals provide a compatibility layer for libraries working with dataframes.
https://networkx.org/documentation/latest/reference/backends.html
Project Homepage / Git: Abstract as a tweet:Dispatching and Backend Selection Discussion: How to level up from NetworkX to scikit-image and "spatch"?
Category [High Performance Computing]:Other
Expected audience expertise: Domain:expert
Expected audience expertise: Python:expert
I'm an open source software engineer at :probabl. I'm a core developer of scikit-learn
and `imbalanced-learn.
I am a core contributor to Pandas and Apache Arrow, and maintainer of GeoPandas. I did a PhD at Ghent University and VITO in air quality research and worked at the Paris-Saclay Center for Data Science. Currently, I work at Voltron Data, contributing to Apache Arrow, and am a freelance teacher of python (pandas).
I am a scikit-learn core maintainer and work at NVIDIA.
Before working on scikit-learn I helped build mybinder.org and worked on JupyterHub.
Many years ago I was a particle physicist at CERN in Geneva.
Erik Welch is a senior system software engineer on the RAPIDS cuGraph team at NVIDIA and a core NetworkX developer. He has 20 years' experience using Python as a scientist, engineer, and open-source developer on a wide range of data and high-performance computing problems. He primarily works on nx-cugraph
, an accelerated backend to NetworkX, and is the primary maintainer of the popular toolz
library.
Marco is a core dev of pandas and Polars and works at Quansight Labs as Senior Software Engineer. He also consults and trains clients professionally on Polars. He has also written the first Polars Plugins Tutorial and has taught Polars Plugins to clients.
He has a background in Mathematics and holds an MSc from the University of Oxford, and was one of the prize winners in the M6 Forecasting Competition (2nd place overall Q1).
Sebastian has been a NumPy developer for about 10 years now. After a PhD in phsyics he worked at as a postdoc at the Berkeley Institute for Datascience on NumPy as grants byt the Alfred P. Sloan Foundation and the Gordon and Betty Moore Foundation. Since 2022 he has been a software engineer at NVIDIA where he continues to contribute to NumPy.
Hi, I'm Aditi. I am currently part of NetworkX's Core Developer team. NetworkX is a Python library used for graph analysis. I've been working mainly on NetworkX's dispatching side and the nx-parallel backend, previously as an Independent contractor and currently as a GSoC contributor. Also, I presented my work on nx-parallel as a poster at SciPy Con this year. And, I am currently pursuing a bachelor's in Data Science and Application from Indian Institute of Technology, Madras and another bachelor's in Computer Science from Delhi University.
I work at the intersection of computation and research, with a focus on improving open source tooling and supporting the community of developers. I've been involved with scientific Python since the early 2000s, and founded scikit-image in 2009. I am a co-author of Elegant SciPy, a community leader in the Scientific Python project (https://scientific-python.org), and thoroughly enjoy my collaborations with members of this community. I am originally from South Africa, and was privileged to attend my first EuroSciPy (which I loved!) in 2011.