PyCon DE & PyData 2026

Holistic Optimization: Implementing "Pipeline-as-a-Trial" HPO with Ray and Cloud Infra
, Dynamicum [Ground Floor]

Most hyperparameter optimization (HPO) stops at the model boundary. But what happens when your system relies on a complex chain of steps, a short-horizon model, a long-horizon model, ensembles, postprocesses etc? Tuning one piece in isolation often leads to sub-optimal global results.

In this talk, we explore how we used Ray to move beyond simple model tuning. We’ll dive into a "Pipeline-as-a-Trial" architecture where Ray acts as the brain, triggering independent, scalable cloud workflows ( SageMaker Pipelines or Databricks Workflows) for every hyperparameter set.

We will discuss:
* The architectural shift from tuning models to tuning pipelines
* How to build the DAG/pipeline on Sagemaker/Databricks using declarative configs
* How to use Ray to orchestrate heavyweight remote jobs without bottlenecks.

Attendees will learn how to optimize entire pipelines (in a scalable manner on cloud) to minimize global metrics like WAPE, rather than just local model loss.


Have you ever tuned a model to perfection, only to have it fail once integrated into your production pipeline? This is the "local optimization" trap: fixing a component while unintentionally breaking the complex system around it.
At Zalando, where we manage hundreds of forecasting models across 25 countries, local wins often lead to global failures.In this talk, we move beyond single-model tuning to explore Holistic Optimization.
We will detail how our team implemented a "Pipeline-as-a-Trial" architecture,

What We’ll Cover:
* An explanation of what "local optimization" problem is, and how it appears everywhere from tech products to day-to-day life.
* How we leveraged Ray’s distributed capabilities to manage high-concurrency Machine Learning workloads.
* Infrastructure Comparison: A candid, battle-tested breakdown of running HPO across AWS SageMaker, Databricks, and Internal EC2/Metaflow clusters.
* Operational Trade-offs: Real-world insights into the performance, cost, and traceability of different cloud implementations.
*Configuration Driven Development: How an abstract library layer allows us to scale experimentation across hundreds of production models.

Stop chasing local solutions. Join me to learn how to build a distributed HPO framework that optimizes for your global business objectives.

PS: if you are a "Rick and Morty" fan, definitely join to see how Rick fell into the local optimization problem!


Expected audience expertise in your talk's domain:: Intermediate Expected audience expertise in Python:: Intermediate
See also: Slides (3.7 MB)

Data/MLOps Engineer at Zalando. During my career I always worked along data scientists to build robust ML pipelines. I am very enthusiastic about designing and implementing scalable and robust systems.