Automated Feature Engineering and Selection in Python PyConDE & PyData Berlin 2019

Automated Feature Engineering and Selection in Python

Careful feature engineering and selection can be just as important as choosing the right ML model & hyperparameters. I will present several options for automating the feature engineering and selection process with a focus on the autofeat library.

While there already exist several libraries for automatically selecting the best ML model and its hyperparameters for a prediction task, feature engineering is still mostly a manual task. I will present different options for automating the feature engineering and selection process in Python with a focus on the open source autofeat library, which provides a scikit-learn style linear regression model with automated feature engineering and selection capabilities.

Complex non-linear machine learning models such as neural networks are in practice often difficult to train and even harder to explain to non-statisticians, who require transparent analysis results as a basis for important business decisions. While linear models are efficient and intuitive, they generally provide lower prediction accuracies. The autofeat library provides a multi-step feature engineering and selection process, where first a large pool of non-linear features is generated, from which then a small and robust set of meaningful features is selected, which improve the prediction accuracy of a linear model while retaining its interpretability.

Domain Expertise: some Public link to supporting material:

https://github.com/cod3licious/autofeat

Abstract as a tweet:

Automated feature engineering and selection in Python with the autofeat library.

Domains: Data Science, Machine Learning, Science, Data Engineering, Statistics Python Skill Level: basic

Franziska Horn

Franzi has several years of experience tackling machine learning problems in both research and application contexts. She has specialised in natural language processing, representation learning, and data visualisation. She holds a BSc in cognitive science, a MSc in computer science, and is currently completing her PhD in machine learning, while also working as a freelance data science consultant.