Machine Learning Lifecycle for NLP Classification in E-Commerce
04-19, 14:00–14:30 (Europe/Berlin), A1

Running machine learning models in a production environment brings its own challenges. In this talk we would like to present our solution of a machine learning lifecycle for the text-based cataloging classification system from idealo.de. We will share lessons learned and talk about our experiences during the lifecycle migration from a hosted cluster to a cloud solution within the last 3 years. In addition, we will outline how we embedded our ML components as part of the overall idealo.de processing architecture.


idealo.de offers a price comparison service for millions of products from a wide variety of categories. The automated classification of the offers is carried out using both traditional and deep learning-based approaches. Our machine learning components are part of a fully automated life cycle and process up to 500 million offers daily at peak times.

In addition to the enormous amount of data that we process, we particularly face the challenges of being online 24/7 while adapting to an ever-changing catalog structure. This requires a high level of reliability from our inference service and continuous automated retraining and model deployment.

In this talk we would like to share and present our view on MLOps:
- How we integrate our CI/CD and continuous training pipelines with Github and AWS Sagemaker
- How we migrate the lifecycle from a hosted cluster (running Kubernetes, Argo Workflows and ArgoCD) to the cloud (running AWS Sagemaker and Datalake).
- How we monitor our models as well as data and performance indicators up to date and alert in case of disruptions
- How we embed the classifiers in an event-driven heterogeneous software architecture (based on Kotlin and Python).

And share lessons learned on:
- How we keep reliability high while deploying, updating, and scaling our classification inference services
- How we meet a valid compromise between performance and cost requirements.


Expected audience expertise: Domain

Intermediate

Expected audience expertise: Python

Intermediate

Abstract as a tweet

idealo.de presents its MLOps solution and ML lifecycle for product classification

See also: Presentation slides (6.7 MB)

Gunar Maiwald has a background in Computer Science. For the last 3 years he worked as an ML engineer at idealo.de. His professional programming path led him from Perl via TypeScript to Python.

Tobias Senst is a Senior Machine Learning Engineer at idealo internet GmbH. Tobias Senst received his PhD in 2019 from the Technische Universität Berlin under the supervision of Prof. Thomas Sikora. He has more than 10 years of experience in Computer Vision and Video Analytics research.

At idealo, he switched from the world of images and videos to Natural Language Processing and is responsible for the operation and development of machine learning models in a productive environment.