High performance machine learning with dislib
2019-09-05 , Track 2 (Baroja)

This talk will present dislib, a distributed machine learning library built on top of PyCOMPSs programming model. One of the main focuses of dislib is solving large-scale scientific problems on high performance computing clusters.


PyCOMPSs is a distributed programming model and runtime for Python. PyCOMPSs' main goal is to make distributed computing accessible to non-expert developers by providing a simple programming model, and a runtime that automates many aspects of the parallel execution. In addition to this, PyCOMPSs is infrastructure agnostic, and can run on top of a wide range of platforms, from HPC clusters to clouds, and from GPUs to FPGAs.

This talk will present dislib, a distributed machine learning library built on top of PyCOMPSs. Inspired by scikit-learn, dislib programming interface is based on the concept of estimators. This provides a clean and easy-to-use API that highly increases the productivity of building large-scale machine learning pipelines. Thanks to PyCOMPSs, dislib can run in multiple distributed platforms without changes in the source code, and can handle up to billions of input samples using thousands of CPU cores. This makes dislib a perfect tool for scientists (and other users) that are not machine learning experts, but that still want to extract useful knowledge from extremely large data sets.


Project Homepage / Git:

https://dislib.bsc.es

Project Homepage / Git: Abstract as a tweet:

This talk will present dislib, a distributed machine learning library built on top of PyCOMPSs programming model. One of the main focuses of dislib is solving large-scale scientific problems on high performance computing clusters.

Python Skill Level:

basic

Domain Expertise:

some

Domains:

Big Data, Machine Learning, Parallel computing / HPC

Javier Álvarez is a researcher at the Workflows and Distributed Computing group of the Barcelona Supercomputing Center. His research interests include parallel programming models for distributed infrastructures and large-scale distributed machine learning. Javier received his Ph.D. in computer science from the University of Adelaide in 2018.