JuliaCon 2020 (times are in UTC)

Automatic gradient and scale for high dimensional optimization

Optimization and machine learning must tune models to fit data. Large-scale problems are typically optimized using a variant of gradient descent, where the gradient is calculated automatically, but having such limited information about function behavior slows progress. I will describe mathematics and code for extracting the gradient and "scale"—an upper bound on the diagonal quadratic remainder in a Taylor series expansion—and how having both quantities available enhances optimization.