Standardize your predictors with StandardizedPredictors.jl

Before fitting statistical or machine learning models, it's common to
standardize predictor variables to eliminate fixed offsets and differences in
scale. While these procedures are often easy to implement manually, they can
introduce subtle bugs and add bookkeeping overhead. StandardizedPredctors.jl
builds on StatsModels.jl to provide tools to conveniently and reproducibly
standardize numeric predictors.


Data comes in many and diverse forms, and often it needs to be
transformed before modeling. Even when your data consists of continuous
quantities represented in numerical form, differences in the scale,
offset, and other properties can present challenges to model
construction, fitting, and interpretation. Standardization—by centering,
scaling, z-scoring, etc.—is a common step in pre-processing data to
prepare for modeling. While these procedures are often simple and easy
to implement manually, they add bookkeeping overhead and can introduce
errors (especially when making predictions based on new data).
StandardizedPredictors.jl builds on the StatsModels.jl ecosystem to
provide safe and convenient representations of various standardization
schemes in the form of AbstractTerms. These terms automatically compute
standardization parameters before fitting, support manual override,
lazily apply standardization procedures to the data, do the necessary
bookkeeping to reproducibly generate correct predictors from held-out
data, and generate a “paper trail” in the coefficient tables of fitted
models. This makes it easier and more convenient to standardize data in
a reproducible way and takes away a major footgun in a common data
preprocessing step.