Histogram-based Gradient Boosting in scikit-learn 0.21
2019-09-05, 11:00–11:30, Track 2 (Baroja)

In this presentation we will present some recently introduced features of the scikit-learn Machine Learning library with a particular emphasis on the new implementation of Gradient Boosted Trees.

scikit-learn 0.21 was recently released and this presentation will give an overview its main new features in general and present the new implementation of Gradient Boosted Trees.

Gradient Boosted Trees (also known as Gradient Boosting Machines) are very competitive supervised machine learning models especially on tabular data.

Scikit-learn offered a traditional implementation of this family of methods for many years. However its computational performance was no longer competitive and was dramatically dominated by specialized state of the art libraries such as XGBoost and LightGBM. The new implementation in version 0.21 uses histograms of binned features to evaluate the tree node spit candidates. This implementation can efficiently leverage multi-core CPUs and is competitive with XGBoost and LightGBM.

We will also introduce pygbm, a numba-based implementation of gradient boosted trees that was used as prototype for the scikit-learn implementation and compare the numba vs cython developer experience.

Domain Expertise – some Domains – Big Data, Machine Learning, Parallel computing / HPC, Statistics Project Homepage / Git – https://scikit-learn.org Abstract as a tweet – Histogram-based Gradient Boosted Trees in scikit-learn 0.21 Project Homepage / Git – https://scikit-learn.org Python Skill Level – basic