JuliaCon 2023

Interpretable Machine Learning with SymbolicRegression.jl
2023-07-26 , 32-D463 (Star)

SymbolicRegression.jl is a state-of-the-art symbolic regression library written from scratch in Julia using a custom evolutionary algorithm. The software emphasizes high-performance distributed computing, and can find arbitrary symbolic expressions to optimize a user-defined objective – thus offering a very interpretable type of machine learning. SymbolicRegression.jl and its Python frontend PySR have been used for model discovery in over 30 research papers, from astrophysics to economics.


SymbolicRegression.jl is an open-source library for practical symbolic regression, a type of machine learning that discovers human-interpretable symbolic models. SymbolicRegression.jl was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed backend, a flexible search algorithm, and interfaces with several deep learning packages. The hand-rolled internal search algorithm is a mixed evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown real-valued constants in newly-discovered empirical expressions. The backend is highly optimized, capable of fusing user-defined operators into SIMD kernels at runtime with LoopVectorization.jl, performing automatic differentiation with Zygote.jl, and distributing populations of expressions to thousands of cores across a cluster using ClusterManagers.jl. In describing this software, I will also share a new benchmark, “EmpiricalBench,” to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.

In this talk, I will describe the nuts and bolts of the search algorithm, its efficient evaluation scheme, DynamicExpressions.jl, and how SymbolicRegression.jl may be used in scientific workflows. I will review existing applications of the software (https://astroautomata.com/PySR/papers/). I will also discuss interfaces with other Julia libraries, including SymbolicUtils.jl, as well as SymbolicRegression.jl's PyJulia-enabled link to the ScikitLearn ecosystem in Python.

Miles Cranmer is an Assistant Professor in Data Intensive Science at the University of Cambridge. He received his PhD in Astrophysics from Princeton University and his BSc in Physics from McGill University.

Miles is interested in the automation of scientific research with machine learning. His current focus is the development of interpretable machine learning methods, especially symbolic regression, as well as applying them to multi-scale dynamical systems.