PyData London 2026

Production-Ready AI Agents: From LLMs to Small Language Models
2026-06-06 , Grand Hall 2

Building a demo agent with hundred billion parameters and beyond can be easy. Deploying reliable, cost-effective agents in production is hard. This talk provides a comprehensive roadmap for taking AI agents from prototype to production, with a focus on migrating from expensive frontier LLMs to efficient small language models (SLMs).

We'll explore the entire lifecycle of production agent development: test-driven development practices adapted for non-deterministic AI systems, agent architectures and migration strategies from large to small models, CI/CD considerations for agents, and observability frameworks which capture what matters and assist in remediating failures.

Whether you're running agents at scale or planning your first deployment, you'll leave with actionable strategies and concrete tools to build reliable, maintainable agent systems with small language models.


In this talk we will cover the complete Agent Development Lifecycle from Prototype to a scalable and robust Production agent with cost effective Small Language Models. The talk will present the following topics, gathered from real engagements with product teams:

  1. The Production Agent Problem (3 min)
    The prototype-to-production gap, why closed, frontier LLMs don't scale, and the agent development lifecycle.

  2. Small Models, Big Impact (2 min)
    The case for small open language models, the current model landscape and pursuing an iterative migration pattern.

  3. Test-Driven Agent Development (5 min)
    Starting with clear use cases and adapting testing practices for non-deterministic systems. Covering evaluation patterns and practical examples of testing agent behavior for different types of agents.

  4. Techniques for migrating to Small Language Models (7 min)
    Introducing task decomposition patterns, use of multi-model approaches and agent architectures better suited to Small Language Model utilisation.

  5. CI/CD for Agents (7 min)
    Treating models and prompts as config rather than code. Building deployment pipelines that handle model and prompt versioning, integration and end-to-end testing for agents with MCP and A2A considerations, and agent packaging for production rollout.

  6. Observability and Monitoring (4 min)
    Instrumenting agents with structured logging, tracking key metrics beyond traditional monitoring, and building dashboards and alerts that surface quality issues. Monitoring non-functional metrics such as cost, latency and concurrency.

  7. Continuous Improvement Loops (4 min)
    Creating feedback pipelines from production data, triaging failures and automating analysis. Strategies for iterative improvement, and methods for measuring progress through A/B testing.

As part of this talk, we will reference some Jupyter Notebooks and reusable code snippets with the PyData stack to enable attendees to begin their own Agentic journeys to production with Small Language Models.

See also: Useful Code Snippets and Blogs on working with SLMs for Agentic Applications

Prattyush is a Research Software Engineer working in the Granite Feedback Team in IBM Research, based in the UK (Winchester) and the US (New York).

IBM Granite is the family of AI models from IBM and Prattyush leads product and client engagements to increase adoption of the models across various use-cases. He is a technical leader for Agentic and GenAI applications, leading efforts for education content and acts as one of the release managers, contributing to testing and release efforts.

Prattyush is part of the wider AI Foundations organisation and as such regularly contributes to the development of the latest IBM Research technologies, both internally and through open source.