BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//euroscipy-2026//speaker//XYRCUM
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-euroscipy-2026-FT7JQV@pretalx.com
DTSTART;TZID=CET:20260721T121000
DTEND;TZID=CET:20260721T123000
DESCRIPTION:Large Language Models are increasingly integrated into scientif
 ic and production workflows\, yet evaluation practices often remain inform
 al and notebook-driven. This talk explores how to build reproducible\, mea
 surable\, and regression-safe LLM evaluation pipelines using Python. We wi
 ll examine dataset design\, metric selection\, deterministic evaluation ha
 rnesses\, and CI integration strategies that transform LLM experimentation
  into disciplined\, testable engineering workflows.
DTSTAMP:20260603T190645Z
LOCATION:Room 1.19 (Ground Floor\, Shannon)
SUMMARY:Making LLM Evaluation Reproducible in Python - Jigyasa Grover\, Ris
 habh Misra
URL:https://pretalx.com/euroscipy-2026/talk/FT7JQV/
END:VEVENT
END:VCALENDAR