High performance machine learning with dislib
2019-09-05, 15:45–16:00, Track 2 (Baroja)

This talk will present dislib, a distributed machine learning library built on top of PyCOMPSs programming model. One of the main focuses of dislib is solving large-scale scientific problems on high performance computing clusters.


PyCOMPSs is a distributed programming model and runtime for Python. PyCOMPSs' main goal is to make distributed computing accessible to non-expert developers by providing a simple programming model, and a runtime that automates many aspects of the parallel execution. In addition to this, PyCOMPSs is infrastructure agnostic, and can run on top of a wide range of platforms, from HPC clusters to clouds, and from GPUs to FPGAs.

This talk will present dislib, a distributed machine learning library built on top of PyCOMPSs. Inspired by scikit-learn, dislib programming interface is based on the concept of estimators. This provides a clean and easy-to-use API that highly increases the productivity of building large-scale machine learning pipelines. Thanks to PyCOMPSs, dislib can run in multiple distributed platforms without changes in the source code, and can handle up to billions of input samples using thousands of CPU cores. This makes dislib a perfect tool for scientists (and other users) that are not machine learning experts, but that still want to extract useful knowledge from extremely large data sets.


Domains – Big Data, Machine Learning, Parallel computing / HPC Project Homepage / Git Domain Expertise – some Python Skill Level – basic Project Homepage / Git – https://dislib.bsc.es Abstract as a tweet – This talk will present dislib, a distributed machine learning library built on top of PyCOMPSs programming model. One of the main focuses of dislib is solving large-scale scientific problems on high performance computing clusters.