BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//pydata-london-2026//talk//8Y9GRD
BEGIN:VTIMEZONE
TZID:GMT
BEGIN:STANDARD
DTSTART:20001029T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:GMT
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T020000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:BST
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-pydata-london-2026-8Y9GRD@pretalx.com
DTSTART;TZID=GMT:20260606T115000
DTEND;TZID=GMT:20260606T123500
DESCRIPTION:As AI agents become more popular\, one question becomes increas
 ingly important: how do you actually know if your agent is performing well
 ? Multi-turn conversations are hard to evaluate because because there is r
 arely one right answer and at any given turn multiple responses can be cor
 rect. In this talk\, we'll walk through a structured approach to evaluatin
 g complex conversations. We'll cover what makes a good conversation\, tech
 niques for evaluating multi-turn conversations where multiple outcomes are
  simultaneously valid\, and how to scale evaluation pipelines. Finally\, w
 e'll discuss practical frameworks for continuous improvement and building 
 confidence in your agent's real-world behaviour.
DTSTAMP:20260602T223343Z
LOCATION:Grand Hall 2
SUMMARY:Evaluating multi-turn conversations: A practical guide to AI Agent 
 evals - Lena Shakurova
URL:https://pretalx.com/pydata-london-2026/talk/8Y9GRD/
END:VEVENT
END:VCALENDAR
