Exploring GPU-powered backends for scikit-learn
2023-08-17 , Aula

Could scikit-learn future be GPU-powered ? This talk will discuss the performance improvements that GPU computing could bring to existing scikit-learn algorithms, and will describe a plugin-based design that is being foresighted to open-up scikit-learn compatibility to faster compute backends, with special concern for user-friendliness, ease of installation, and interoperability.


GPUs are known to be the preferred hardware for deep-learning based applications, but their use for a wide range of other algorithms has also been proved to be relevant: k-means, random forests, nearest neighbors search,... CPU-based implementations can be outshined and more particularly so where the data is plentiful to the point where the duration for training an estimator becomes a bottleneck. But at what point does it really start to matter, and can it really be a concern for scikit-learn users ? we explore a few usecases to try to highlight what is at stake.

But bringing more options for accelerated computing backends could challenge the principles of ease-of-use, ease of installation, and user friendliness, that are at the core of scikit-learn design. As for today, GPU computing software doesn't benefit from seemlessly cross-vendor portability features as much as CPU software does, as a result end-users risk confusion with choosing carefully the hardware and compatible libraries, some of which could be proprietary, and could face high interoperability cost if changing the hardware requires changing the software stack. The talk introduces the open-source SYCL-based software toolchain that aims at unlocking interperobility accross all hardware accelerators and all manufacturers.

The scikit-learn library furthermore envisions a plugin-based system that enable external projects to provide alternative compute backends to existing estimators. The plugin-based system eases the development and distributions of backends that could be maintained under the umbrella of the scikit-learn project, while also opening up to third-party providers. Plugins should be easily pip- or conda- installables, seemlessly unlock better performance for scikit-learn estimators, should conform to the same specifications and the same quality standards than scikit-learn default engines, and be swappables so that all users can keep porting and sharing their estimators without regard to the compute backend it has been trained with. Several plugins are being experimented with currently, such as the sklearn_numba_dpex project that uses the OneAPI-based toolchain.


Category [High Performance Computing]

Other

Expected audience expertise: Domain

some

Expected audience expertise: Python

some

Project Homepage / Git

https://github.com/soda-inria/sklearn-numba-dpex

Abstract as a tweet

Could scikit-learn future be GPU-powered ? The talk discusses the performance improvements that GPU computing could bring to existing scikit-learn algorithms

Machine Learning software engineer at Inria and member of the maintainers' team of the scikit-learn open source project.

This speaker also appears in:

I graduated as a machine learning research engineer in 2016, with a specialization in NLP. I co-founded Sancare a start-up company that aims at bringing NLP-based solutions for medical data analysis to hospitals, and that has made a place for itself in the market with a performant NLP-powered billing assistant for medical stays. I'm now working at INRIA, France as a Machine Learning Research Engineers, focused on performance computing.

This speaker also appears in: