2019-09-05 –, Track 2 (Baroja)
In this presentation we will present some recently introduced features of the scikit-learn Machine Learning library with a particular emphasis on the new implementation of Gradient Boosted Trees.
scikit-learn 0.21 was recently released and this presentation will give an overview its main new features in general and present the new implementation of Gradient Boosted Trees.
Gradient Boosted Trees (also known as Gradient Boosting Machines) are very competitive supervised machine learning models especially on tabular data.
Scikit-learn offered a traditional implementation of this family of methods for many years. However its computational performance was no longer competitive and was dramatically dominated by specialized state of the art libraries such as XGBoost and LightGBM. The new implementation in version 0.21 uses histograms of binned features to evaluate the tree node spit candidates. This implementation can efficiently leverage multi-core CPUs and is competitive with XGBoost and LightGBM.
We will also introduce pygbm, a numba-based implementation of gradient boosted trees that was used as prototype for the scikit-learn implementation and compare the numba vs cython developer experience.
Histogram-based Gradient Boosted Trees in scikit-learn 0.21
Python Skill Level:basic
Domain Expertise:some
Domains:Big Data, Machine Learning, Parallel computing / HPC, Statistics
Olivier is a Software Engineer at Inria working on scikit-learn and related projects of the Python Data ecosystem.