How to compare apples with oranges: Proper evaluation of article-level demand forecasts
Stefan Birr, Mones Raslan
How do you evaluate performance when you predict more than 10 million time series each day? While a good plot can be worth more than a thousand metrics for a single time series, with large-scale machine learning models implemented with LightGBM and PyTorch we have to resort to meaningful aggregations. We will share insights and learnings from the past 2 years of deploying and operating our article-level demand forecasting models at the pricing department of Zalando.
This talk moves beyond basic metrics to showcase the pitfalls of aggregated error measures and the best practices we’ve developed to keep our stakeholders informed and our models accurate.
PyData: Machine Learning & Deep Learning & Statistics
Titanium [2nd Floor]