PyConDE & PyData Berlin 2024

Mostly Harmless Fixed Effects Regression in Python with PyFixest
2024-04-24 , A1

This session introduces PyFixest, an open source Python library inspired by the "fixest" R package. PyFixest implements fast routines for the estimation of regression models with high-dimensional fixed effects, including OLS, IV, and Poisson regression. The library also provides tools for robust inference, including heteroscedasticity-robust and cluster robust standard errors, as well as the wild cluster bootstrap. Additionally, PyFixest implements several routines for difference-in-differences estimation with staggered treatment adoption.

PyFixest aims to faithfully replicate the core design principles of "fixest", offering post-estimation inference adjustments, user-friendly syntax for multiple estimations, and efficient post-processing capabilities. By making efficient use of jit-compilation, it is also one of the fastest solutions for regressions with high-dimensional fixed effects.

The presentation will cover PyFixest's functionality, design philosophy, and future development prospects.


When regression models contain very high-dimensional categorical features, estimation can become cumbersome: inverting a matrix with more than a few hundred rows is no simple task! Fortunately, the problem of estimating models with high-dimensional fixed effects has been effectively solved since at least the 1930s. A range of software packages now implement what is known as the Frisch-Waugh-Lovell Theorem (FWL) for efficient estimation of regression models with high-dimensional fixed effects. These packages are available in various programming languages, including Stata, R, Julia, and Python.

Among these, the R package fixest particularly stands out. It is not only blazing fast but also offers an innovative and user-friendly post-estimation functionality and syntax.

When I started my journey with Python, fixest was the R package I missed the most. In fact, I missed it so much that I began working on PyFixest, a software package that aims to faithfully replicate all of fixest's innovations in Python.

In this talk, I will introduce the audience to both fixest and PyFixest and the FWL theorem that underpins these packages. We will explore how PyFixest can be used for analyzing AB Tests and for conducting event studies with staggered rollouts.

For more information:


Expected audience expertise: Domain:

Novice

Expected audience expertise: Python:

None

Abstract as a tweet (X) or toot (Mastodon):

"Discover PyFixest, a Python library inspired by R's 'fixest'! 🐍📊 It speeds up regression model estimation with high-dimensional fixed effects, offering tools for robust inference and efficient post-processing. Perfect for AB Tests and event studies! #Python #DataScience #PyDat

Public link to supporting material, e.g. videos, Github, etc.:

https://github.com/s3alfisc/pyfixest

Economist and Data Scientist. I spend most of my week working on online auctions at Trivago and open source packages for regression modeling and inference in R and Python.