Histogram-based Gradient Boosting in scikit-learn 0.21 EuroSciPy 2019

Histogram-based Gradient Boosting in scikit-learn 0.21
.ical

2019-09-05 11:00–11:30, Track 2 (Baroja)

In this presentation we will present some recently introduced features of the scikit-learn Machine Learning library with a particular emphasis on the new implementation of Gradient Boosted Trees.

scikit-learn 0.21 was recently released and this presentation will give an overview its main new features in general and present the new implementation of Gradient Boosted Trees.

Gradient Boosted Trees (also known as Gradient Boosting Machines) are very competitive supervised machine learning models especially on tabular data.

Scikit-learn offered a traditional implementation of this family of methods for many years. However its computational performance was no longer competitive and was dramatically dominated by specialized state of the art libraries such as XGBoost and LightGBM. The new implementation in version 0.21 uses histograms of binned features to evaluate the tree node spit candidates. This implementation can efficiently leverage multi-core CPUs and is competitive with XGBoost and LightGBM.

We will also introduce pygbm, a numba-based implementation of gradient boosted trees that was used as prototype for the scikit-learn implementation and compare the numba vs cython developer experience.

Project Homepage / Git:

https://scikit-learn.org

Project Homepage / Git:

https://scikit-learn.org

Abstract as a tweet:

Histogram-based Gradient Boosted Trees in scikit-learn 0.21

Python Skill Level:

basic

Domain Expertise:

some

Domains:

Big Data, Machine Learning, Parallel computing / HPC, Statistics