Finding an Effective Strategy for AutoML Pipeline Optimization
2021-07-29 , Green

One of the main problems in AutoML implementation is finding the best strategy to search the most optimal pipeline in prediction or classification tasks. This problem is commonly known as CASH (Combined Algorithm Selection and Hyperparameter Optimization). This talk will show competitive results with significantly shorter computation time by just focusing the search in the model selection and structure of the pipeline without the need of hyperparameter optimization.


The CASH problem can be decomposed into three major components:
- searching the optimal m model with n(m) search space
- searching the optimal order of p preprocessing elements with n(p) search space
- searching the optimal h hyperparameters with n(h) search space

The most popular approaches involve simultaneous search of these three components with time complexity of n(p) x n(m) x n(h). An alternative method is to perform the search sequentially starting with m using surrogates p and h followed by searching for p using optimal m and surrogate h, and finally searching for h using optimal p and m found. This alternative technique only involves n(p) + n(m) + n(h) search space which is significantly smaller than simultaneously searching p, m, and h. We find in our experiments using the AutoMLPipeline package, that in many cases, it is sufficient to just search for m and p to achieve competitive performance with those of other optimal algorithms that searches all three components simultaneously.

Relevant paper: https://arxiv.org/abs/2107.01253

Relevant Julia Packages used in the talk:
- AutoMLPipeline.jl
- AMLPipelineBase.jl
- Lale.jl
- Hyperopt.jl

I am a research scientist at the IBM Research working on the following areas: AutoML, AutoAI, RL/ML Optimization, and Decision Optimization.

This speaker also appears in: