Careful feature engineering and selection can be just as important as choosing the right ML model & hyperparameters. I will present several options for automating the feature engineering and selection process with a focus on the autofeat library.
While there already exist several libraries for automatically selecting the best ML model and its hyperparameters for a prediction task, feature engineering is still mostly a manual task. I will present different options for automating the feature engineering and selection process in Python with a focus on the open source autofeat library, which provides a scikit-learn style linear regression model with automated feature engineering and selection capabilities.
Complex non-linear machine learning models such as neural networks are in practice often difficult to train and even harder to explain to non-statisticians, who require transparent analysis results as a basis for important business decisions. While linear models are efficient and intuitive, they generally provide lower prediction accuracies. The autofeat library provides a multi-step feature engineering and selection process, where first a large pool of non-linear features is generated, from which then a small and robust set of meaningful features is selected, which improve the prediction accuracy of a linear model while retaining its interpretability.
Automated feature engineering and selection in Python with the autofeat library.
Domains: Data Science, Machine Learning, Science, Data Engineering, Statistics Python Skill Level: basicFranzi has several years of experience tackling machine learning problems in both research and application contexts. She has specialised in natural language processing, representation learning, and data visualisation. She holds a BSc in cognitive science, a MSc in computer science, and is currently completing her PhD in machine learning, while also working as a freelance data science consultant.