2026-06-06 –, Hardwick Hub
Multi-agent GenAI systems don’t fail because models lack intelligence, they fail because they lack memory.
As LLM applications move from demos to production, semantic memory becomes the defining systems challenge. Agents must remember user preferences, share context across roles, preserve conversational state across sessions, and evolve over time, all without exploding token costs or losing observability.
In this talk, I’ll explore semantic memory as a data engineering problem rather than a prompt engineering trick. Drawing on real-world experience from the Azure Cosmos DB engineering team, we’ll examine how to design layered memory for multi-agent systems in Python: short-term conversational state, episodic event logs, declarative and procedural memory, and retrieval-driven personalization.
Using a practical multi-agent travel planner built with LangGraph, we’ll implement patterns such as session-level versus per-turn persistence, hybrid retrieval design (structured filters plus semantic signals), memory lifecycle management (write, retrieve, summarize, supersede, expire), and checkpointed workflows for reproducibility and debugging.
You’ll leave with practical design heuristics for building agent systems that become more reliable, more efficient, and more explainable over time.
All demonstrations will be in Python and applicable to production-scale systems.
This session focuses specifically on semantic memory architecture as the critical systems layer in production-grade multi-agent AI applications.
From my role on the Azure Cosmos DB engineering team, I’ve worked with teams building large-scale agentic systems that must support multi-tenancy, personalization, long-lived conversational state, and operational observability. A consistent lesson is that orchestration frameworks coordinate agents, but memory design determines whether the system behaves coherently over time.
The talk will cover:
- A practical taxonomy of agent memory: short-term state, episodic logs, declarative knowledge, and procedural memory
- Modeling conversations as append-only event streams versus mutable session documents
- Designing retrieval-aware memory stores that combine structured filtering with semantic signals
- Memory lifecycle management: summarization spans, supersession flags, retention windows, and TTL-based compaction
- Checkpointed agent workflows for traceability and debugging
- Multi-tenant memory partitioning strategies
- Cost tradeoffs between growing context windows and durable storage
A live Python-based multi-agent travel planner (built with LangGraph and backed by Azure Cosmos DB) will demonstrate these patterns in practice, including MCP-based memory tools that separate reasoning from storage concerns.
The goal is to provide PyData attendees with a concrete systems framework for thinking about semantic memory, not as an afterthought to prompting, but as a first-class data architecture problem at the intersection of distributed systems and applied AI.
Theo is passionate about NoSQL and distributed computing. He joined Microsoft in 2017 and has been in the Cosmos DB Engineering team as a Program Manager since 2019. He currently focuses on AI, programmability, and developer experience for Azure Cosmos DB. He has a masters degree in Data Science from Dundee University, and lives in the UK with his wife, two boys, and ragcoon cat.