Sparse Data in the Scientific Python Ecosystem: Current Needs, Recent Work, and Future Improvements
2023-08-16 , HS 119 - Maintainer track

This maintainer track aims to lead discussions about the current needs for sparse data in the scientific python Ecosystem. It will present achievements and pursuit of the work initiated in the first Scientific Python Developer Summit, which took from 22nd May to 28th May 2023.


Sparse data refers to datasets where a high percentage of the values are zero or empty. Sparse arrays are one possible data structure for efficiently handling such datasets. Sparse Matrices from SciPy have been existing and have been used extensively within the scientific python ecosystem since its beginning.

While those foundational representations are still relevant for most use cases, edge cases and recent downstream libraries' needs remain to be considered. Moreover, the generalization of sparse matrices to sparse arrays comes with an important refactoring whose changes impact existing workflow and historical decisions and implementations.

This maintainer track aims to lead discussions about the current needs for sparse data in the scientific python Ecosystem. It will present achievements and pursuit of the work initiated in the first Scientific Python Developer Summit, which took from 22nd May to 28th May 2023.


Expected audience expertise: Domain:

some

Expected audience expertise: Python:

some

Category [High Performance Computing]:

Vector and Array Manipulation

Abstract as a tweet:

Sparse Data in the Scientific Python Ecosystem: Current Needs, Recent Work, and Future Improvements

Public link to supporting material:

https://docs.google.com/presentation/d/1VVj-jOYulBC6h8vMURA-fiQHfYyKi6PUzfFL9K8QLoY/edit?usp=sharing

Project Homepage / Git:

https://scientific-python.org/summits/sparse/

Julien is a Scientific Software Engineer at QuantStack. He holds a MSc. Computer Science & Engineering from Université de Technologie de Compiègne and a MSc. Applied Mathematics, Computer Vision and Machine Learning from École Normale Supérieure Paris-Saclay.

Julien is involved in the Scientific Python ecosystem and co-maintain scikit-learn

Prior to joining QuantStack, Julien worked as a Research Software Engineer at Inria.