PyCon DE & PyData 2025

3 Ways to Speed up Your Regression Modeling in Python
2025-04-25 , Titanium3

Linear Regression is the workhorse of statistics and data science. Some data scientists even go as far and argue that "linear regression is all you need".

In this talk, we will introduce three ways to run regression models faster by using smarter algorithms, implemented in the scikit-learn & fastreg (sparse solvers), pyfixest (Frisch-Waugh-Lovell), and duckreg (regression compression via duckdb) libraries.


We introduce three different ways to make regressions run faster.

We first introduce sparse solvers and show how to run regressions on sparse matrices via scikit-learn and the fastreg libraries.

We then lay out the Frisch-Waugh-Lovell theorem and the alternating projections algorithm and show how to speed it up on the CPU (via numba) and on the GPU (via JAX) as implemented in the pyfixest library.

Finally, we demonstrate how to drastically speed up regression estimation by first preprocessing the data in duckdb and then fitting a regression via weighted least squares in memory.

References:
- fastreg: https://github.com/iamlemec/fastreg
- scikit-learn: https://github.com/scikit-learn/scikit-learn
- pyfixest: https://github.com/py-econometrics/pyfixest
- duckreg: https://github.com/py-econometrics/duckreg


Expected audience expertise: Domain:

Intermediate

Expected audience expertise: Python:

None

Economist and Data Scientist. I spend most of my week working on online auctions at Trivago. In the evenings and weekend, I work on open source packages for regression modeling and inference in R and Python.