JuliaCon 2020 (times are in UTC)

StatsModels.jl: Mistakes were made/A `@formula` for success
07-30, 18:00–18:30 (UTC), Green Track

What happens when you re-implement a critical piece of the data science
ecosystem from what was essentially an R clone to take full advantage of the
Julia language? You learn a lot about flexibility, composability, and
performance.


Transforming tabular, heterogeneous data into numerical arrays is a critical
first step in many data analysis pipelines. StatsModels.jl provides
functionality for this through the @formula macro.

The earliest implementations of @formula in Julia were based on R, which has a
very different model for metaprogramming and composition across packages. Over
the last two years, we re-implemented the @formula from the ground up in a
more Julian fashion, trying to strike a balance between maintaining a
continuous, familiar experience for front-end users while also taking advantage
of Julia's many features to create a hackable, flexible, modular, and extensible
platform that other packages can build on.

In this talk, I'll show you how the current implementation achieves these goals,
but more importantly what we learned in the process of rewriting this critical
piece of data science infrastructure. I'll pay special attention to mistakes we
made in initial development, lessons we learned from those mistakes about how to
make a flexible, composable package, and issues for current and future
development. You'll learn how StatsModels.jl takes advantage of multiple
dispatch to allow other packages to hook into and extend the @formula system
while still playing nicely together.

Like many before and after him, Dave started hacking on Julia to procrastinate finishing his dissertation. Despite his best efforts he finished his PhD in Brain and Cognitive Sciences in 2016. In his day job as an Assistant Professor of Psychology at Rutgers New Brunswick, he works on understanding how people understand spoken language with such apparent ease, combining behavioral, computational, and neural approaches. Otherwise he's committed to promoting open, reproducible science, and designing tools that empower researchers and lower the barrier to entry for data analysis and statistics.