JuliaCon 2020 (times are in UTC)

JLBoost.jl: Hackable XGBoost-like Gradient Boosting Tree Package

XGBoost is one of the most popular machine learning (ML) libraries for tabular data. JLBoost.jl is a pure-Julia implementation of XGBoost-like algorithms that is more hackable and plays nice with the rest of the Julia data ecosystem e.g. DataFrames.jl and JDF.jl.

What makes JLBoost hackable? Something is hackable if you can customize it to incorporate novel features easily. I will show you why this is unique ot JLBoost and it not achievable with other XGBoost-like implementations.


JLBoost.jl implements XGBoost-like algorithm in pure-Julia. However, it's very hackable. I will give two examples

Firstly, providing a custom loss function is very easy.

Secondly, the node-to-split functionality is also customizable. In traditional tree-based machine learning (ML) models, at each step in the model-building algorithm, the algorithm chooses the best split by trying all features one by one; an innovation implemented by LightGBM is to try only the features that the node has split on in the previous round, this has major ramifications for the efficiency of the algorithm. It turns out that trying every feature has a major cost in that the data need to be sorted by the feature being tried, and sorting is computationally expensive. LightGBM gets around this by choosing the same feature and hence saves on computation (at the cost of over-fitting and slightly lower overall predictive power on average). It's not possible for the average user to hack the XGBooost C++ implementation to implement this unless you are the developer. With JLBoost.jl, you can achieve LightGBM's algorithm by providing a feature selection function which choose splits exactly as it would with LightGBM, which is highly customizable by you!

JLBoost.jl lets you insert custom functions into key points in the algorithm so that you can customize many aspects of the tree-building process which in turns makes JLBoost.jl very hackable!