Careful feature engineering and selection can be just as important as choosing the right ML model & hyperparameters. I will present several options for automating the feature engineering and selection process with a focus on the autofeat library.
While there already exist several libraries for automatically selecting the best ML model and its hyperparameters for a prediction task, feature engineering is still mostly a manual task. I will present different options for automating the feature engineering and selection process in Python with a focus on the open source autofeat
library, which provides a scikit-learn
style linear regression model with automated feature engineering and selection capabilities.
Complex non-linear machine learning models such as neural networks are in practice often difficult to train and even harder to explain to non-statisticians, who require transparent analysis results as a basis for important business decisions. While linear models are efficient and intuitive, they generally provide lower prediction accuracies. The autofeat
library provides a multi-step feature engineering and selection process, where first a large pool of non-linear features is generated, from which then a small and robust set of meaningful features is selected, which improve the prediction accuracy of a linear model while retaining its interpretability.
some
Public link to supporting material: Abstract as a tweet:Automated feature engineering and selection in Python with the autofeat library.
Domains:Data Science, Machine Learning, Science, Data Engineering, Statistics
Python Skill Level:basic