BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//pyconde-pydata-2026//talk//3UHPZB
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-pyconde-pydata-2026-3UHPZB@pretalx.com
DTSTART;TZID=CET:20260414T171000
DTEND;TZID=CET:20260414T174000
DESCRIPTION:LLM applications frequently pass tests but fail users in produc
 tion. This talk examines the gap between evaluation metrics and user exper
 ience through three lenses: **Expectations** (what "working" means to user
 s)\, **Functional** (system-level vs. component-level success)\, and **Ope
 rational** (real-world reliability).\n\nDrawing from production experience
 \, we'll share scenarios of expectation mismatches\, silent failures\, and
  undetected drift—plus practical strategies for bridging the gap. The co
 re message: evaluation should answer whether your system serves users\, no
 t whether it passes tests.
DTSTAMP:20260523T181833Z
LOCATION:Palladium [2nd Floor]
SUMMARY:It Works on My Machine: Why LLM Apps Fail Users (Not Tests) - Thoma
 s Prexl\, Frank Rust
URL:https://pretalx.com/pyconde-pydata-2026/talk/3UHPZB/
END:VEVENT
END:VCALENDAR
