Automated Feature Engineering and Selection in Python

Careful feature engineering and selection can be just as important as choosing the right ML model & hyperparameters. I will present several options for automating the feature engineering and selection process with a focus on the autofeat library.


While there already exist several libraries for automatically selecting the best ML model and its hyperparameters for a prediction task, feature engineering is still mostly a manual task. I will present different options for automating the feature engineering and selection process in Python with a focus on the open source autofeat library, which provides a scikit-learn style linear regression model with automated feature engineering and selection capabilities.

Complex non-linear machine learning models such as neural networks are in practice often difficult to train and even harder to explain to non-statisticians, who require transparent analysis results as a basis for important business decisions. While linear models are efficient and intuitive, they generally provide lower prediction accuracies. The autofeat library provides a multi-step feature engineering and selection process, where first a large pool of non-linear features is generated, from which then a small and robust set of meaningful features is selected, which improve the prediction accuracy of a linear model while retaining its interpretability.


Domain Expertise:

some

Public link to supporting material:

https://github.com/cod3licious/autofeat

Abstract as a tweet:

Automated feature engineering and selection in Python with the autofeat library.

Domains:

Data Science, Machine Learning, Science, Data Engineering, Statistics

Python Skill Level:

basic