PyCon DE & PyData 2025

Langfuse, OpenLIT, and Phoenix: Observability for the GenAI Era
2025-04-25 , Palladium

Large Language Models (LLMs) are transforming digital products, but their non-deterministic behaviour challenges predictability and testing, making observability essential for quality and scalability.

This talk presents observability for LLM-based applications, spotlighting three tools: Langfuse, OpenLIT, and Phoenix. We'll share best practices about what and how to monitor LLM features and explore each tool's strengths and limitations.

Langfuse excels in tracing and quality monitoring but lacks OpenTelemetry support and customization. OpenLIT, while less mature, integrates well with existing observability stacks using OpenTelemetry. Phoenix stands out in debugging and experimentation but struggles with real-time tracing.

The comparison will be enhanced by live coding examples.

Attendees will walk away with an improved understanding of observability for GenAI applications and will understand which tool to use for their use case.


Large Language Models (LLMs) are becoming core components of modern digital products. However, their non-deterministic nature means that their behaviour cannot be fully predicted or tested before deployment. This makes observability an essential practice for building and maintaining applications with generative AI features.

This session focuses on observability in LLM-based systems.

We start by motivating why monitoring and understanding your application is key to ensuring quality, reliability, and scalability. We’ll analyze three leading tools for observability in this domain: Langfuse, OpenLIT, and Phoenix. Each has unique strengths and challenges that make them suitable for different use cases.

Through examples and real-world scenarios, we’ll explore:

  • How Langfuse provides detailed tracing and quality monitoring through developer-friendly APIs. While it supports multi-step workflows effectively, it lacks support for the OpenTelemetry protocol and can be difficult to customize for non-standard use cases.
  • Why OpenLIT, built on OpenTelemetry, offers strong observability for distributed systems. Although it is the least mature of the three tools, it integrates well with established observability stacks and has promising potential for future growth.
  • Where Phoenix fits into the process by combining experimentation and debugging capabilities with evaluation pipelines. Its strength lies in development-focused observability, but it has limitations in handling real-time tracing once systems are in production.

This talk will provide a clear, straightforward comparison of these tools, helping you understand which option best fits your LLM applications.

You’ll leave with practical insights into how observability can enhance the reliability and performance of your generative AI systems.


Expected audience expertise: Domain:

Intermediate

Expected audience expertise: Python:

Advanced

Emanuele is an engineer, researcher, and entrepreneur with a passion for artificial intelligence.

He earned his PhD by exploring time series forecasting in the energy sector and spent time as a guest researcher at EPFL in Lausanne. Today, he is co-founder and Head of AI at xtream, a boutique company that applies cutting-edge technology to solve complex business challenges.

Emanuele is also a contract professor in AI at the Catholic University of Milan. He has published eight papers in international journals and contributed to over 30 international conferences worldwide. His engagements include AMLD Lausanne, ODSC London, WeAreDevelopers Berlin, PyData Berlin, PyData Paris, PyCon Florence, the Swiss Python Summit in Zurich, and Codemotion Milan.

Emanuele has been a guest lecturer at Italian, Swiss, and Polish universities.