Observing Agentic AI in Production: MCP Server Tracing with OpenTelemetry and Animal Crossing PyData London 2026

Observing Agentic AI in Production: MCP Server Tracing with OpenTelemetry and Animal Crossing
.ical
2026-06-05 10:50–12:20, Doddington Forum

AI agents are moving into production in 2026, but when something goes wrong (a tool call fails silently, an LLM takes 13 seconds to respond, token costs spike overnight) teams struggle to diagnose issues across multi-step agentic workflows. In this hands-on tutorial you will solve a real problem on the island in Animal Crossing with a FastMCP Model Context Protocol (MCP) server in Python, instrumenting it with OpenTelemetry following the emerging GenAI and MCP semantic conventions and visualising end-to-end traces in a local Jaeger instance. Did I mention that events on the island occur in real time and are collected and processed using Apache Kafka?

You will learn how distributed tracing captures the hierarchical relationship between agent conversations, tool executions and MCP protocol messages, and how to use that visibility for debugging, cost analysis and performance optimisation (including picking the right model and checking if you’re drowning in serialisation overhead). You will leave with a fully instrumented MCP server, a Docker Compose real-time observability stack and the knowledge to bring production-grade observability to your own agentic AI systems.

Why this matters

OpenTelemetry is rapidly becoming the standard telemetry backbone for AI agents, just as it is already for microservices. It is one of the most active CNCF projects after Kubernetes, with native support from 30+ observability vendors. Its GenAI Special Interest Group declared 2025 the "year of AI agents" and has since published purpose-built semantic conventions for LLM calls, agent orchestration, and MCP tool calls. The industry has followed: Amazon launched Bedrock AgentCore Observability built entirely on OTel and GenAI semantic conventions; Grafana Labs demonstrated production tracing of the OpenAI Agents SDK and AWS Bedrock AgentCore.

However, most teams building agents today have none of this. The reason is a “developer experience gap”: many agent builders come from data science and ML research backgrounds, not distributed systems, and have never configured a tracing pipeline. Traditional monitoring tools don't capture the signals that matter for agents: token usage, cost per invocation, tool selection, multi-agent handoffs. Since agentic architecture is interaction-centric (98% of wall-clock time is spent in LLM API calls and tool executions, not your code), this means distributed tracing, not traditional metrics, is the primary observability signal. Without it, failures are invisible: one fintech company's agent ran in a loop for 11 hours accumulating $47,000 in costs before anyone noticed.

What we will do

We will instrument a FastMCP server that exposes tools for a fun real-time data engineering scenario, instrument it with OpenTelemetry and visualise the resulting traces.

Check out a FastMCP server (understand the MCP request/response lifecycle).
OpenTelemetry for agentic AI (traces, metrics, logs and why they're the primary signal for agents).
Instrument the MCP server (OpenTelemetry instrumentation, see how errors are automatically recorded with stack traces).
From traces to dashboards (build a dashboard that answers which tools are slowest, showing error rates and token costs).
Production patterns and case studies (patterns for sensitive data handling, sampling strategies for high-throughput agent workflows).
Connecting auth and observability (auth attributes appearing in traces when OAuth is enabled, giving per-user visibility).

Target audience

Data engineers, data scientists, ML/AI engineers and SRE/platform engineers who are building or operating AI agents and need production visibility into agentic workflows. This is relevant to anyone deploying LLM-powered tools, multi-agent orchestration or MCP servers. Or you’re just a fan of Animal Crossing and social simulation gaming.

Prerequisites

Basic to Intermediate Python (comfortable with decorators, async/await basics and uv).
No prior knowledge of MCP, OpenTelemetry or FastMCP is required.

Tutorial requirements

MacOS/Linux laptop or Windows with PowerShell.
Docker, Colima or OrbStack (to run Docker Compose for the local observability stack).
uv for package management.
A code editor (VS Code, Cursor, Kiro or similar).
LLM access, either via a vendor (Anthropic, OpenAI, etc) or local Ollama. We will be serving a local 1B model, so you’ll need enough RAM and disk space ~4 GB.
Visit the GitHub repo https://tinyurl.com/anteaters26 and follow the SETUP.md to install all the tools prior to arrival.

Key takeaways

Understand why distributed tracing (rather than traditional metrics) is the primary observability signal for agentic AI systems.
Be able to build an MCP server with custom tools using FastMCP and instrument it with OpenTelemetry.
Know the OpenTelemetry GenAI and MCP semantic conventions and how they standardise telemetry across agent frameworks.
Be able to visualise, query and dashboard agent traces using Jaeger.
Understand the production observability landscape: auto-instrumentation libraries, sensitive data handling and compliance considerations.

Tun Shwe

Tun leads AI Engineering at Lenses, where he is focused on helping companies imagine and implement their strategic vision with agentic AI systems fuelled by real-time context. He was previously a Head of Data and Data/ML Engineer at high growth startups and has spent 20 years building data-intensive applications and leading T-shaped teams.

Tun is a co-organiser for the annual PyData London conference and co-founder of PyData Cornwall. He is a strong advocate in the Python AI engineering community and contributor to open source AI engineering and Apache Kafka tools.

In his spare time, Tun goes surfing, plays guitar and shoots 35mm film.

Fei Phoon

Data Engineer in AI Platform at The Economist, PyData Cornwall co-founder, and committed diversity and inclusion ally.

Observing Agentic AI in Production: MCP Server Tracing with OpenTelemetry and Animal Crossing .ical 2026-06-05 10:50–12:20, Doddington Forum