Routing Strategies for Heterogeneous GenAI Systems: Lessons from Real-World Practice EuroSciPy 2025

Routing Strategies for Heterogeneous GenAI Systems: Lessons from Real-World Practice
.ical
2025-08-21 11:05–11:35, Room 1.38 (Ground Floor)

Techniker Krankenkasse employs multiple specialized generative AI (GenAI) systems tailored to specific tasks, domains, costs, and latency needs. This multi-system strategy boosts robustness and efficiency but poses the operational challenge of routing queries to the most suitable GenAI model.

The talk describes practical experiences with developing dynamic routing pipelines using techniques such as regular-expression filters, Named Entity Recognition (NER), few-shot intent classifiers, lightweight generative models for economical context-aware routing, and selective escalation to advanced models only when necessary. Insights and best practices from real-world implementation are shared.

Modern Generative AI (GenAI) systems combine prompts, language models, inference servers, and specialized hardware into sophisticated stacks. As no single large GenAI system excels at all tasks, we at Techniker Krankenkasse are increasingly adopting a multi-system approach, employing different models tailored to specific tasks, domains, cost, or latency requirements. While this approach enhances robustness and efficiency, it introduces a critical operational challenge: effectively routing each incoming query to the most suitable GenAI system.

In this talk, we present our real-world experiences developing dynamic routing pipelines for selecting the optimal GenAI system based on input content and task specificity. We detail the evolution and refinement of our routing strategies, including:

Regular-expression filters to quickly capture clear-cut topics and enforce guardrails;
Off-the-shelf Named Entity Recognition (NER) modules to integrate domain-specific contextual signals;
Few-shot fine-tuning intent classifiers capable of generalizing beyond simple keyword matching
Lightweight generative LLMs that enable cost-effective, context-aware decision-making;
Selective escalation strategies employing state-of-the-art LLMs exclusively when more economical routes provide insufficient confidence.

We share insights and best practices from our real-world implementation experience.

Expected audience expertise: Domain: none Expected audience expertise: Python: some Your relationship with the presented work/project: Original author or co-author

Oliver Zeigemann

Oliver Zeigermann has been developing software for 40 years, progressing from assembly language to C, then Python, and ultimately to machine learning. He currently works as a machine learning engineer at Techniker Krankenkasse.

Routing Strategies for Heterogeneous GenAI Systems: Lessons from Real-World Practice .ical 2025-08-21 11:05–11:35, Room 1.38 (Ground Floor)

Routing Strategies for Heterogeneous GenAI Systems: Lessons from Real-World Practice
.ical
2025-08-21 11:05–11:35, Room 1.38 (Ground Floor)