BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//pydata-london-2026//talk//QQWDVQ
BEGIN:VTIMEZONE
TZID:GMT
BEGIN:STANDARD
DTSTART:20001029T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:GMT
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T020000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:BST
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-pydata-london-2026-QQWDVQ@pretalx.com
DTSTART;TZID=GMT:20260607T153000
DTEND;TZID=GMT:20260607T161500
DESCRIPTION:Most RAG demos stop at retrieval and summarisation. In practice
 \, we also need to measure the understanding of users\, models\, and the s
 ource material. This talk introduces a reusable evaluation pattern that tu
 rns any document into a live-graded “exam engine” using Python tools i
 ncluding Docling\, DeepEval\, and Marimo.\n\nWe will build a stateful appl
 ication that generates multiple-choice and free-text questions from comple
 x documents\, creates realistic distractors\, and scores answers in real t
 ime using an LLM-as-judge pipeline. The demo is intentionally playful\, bu
 t each component maps to a production concern: layout-aware ingestion (tab
 les and figures)\, synthetic QA dataset creation\, semantic grading\, and 
 interactive evaluation loops.\n\nAttendees will learn how to move beyond p
 assive RAG towards systems that benchmark knowledge\, support training wor
 kflows\, and enable human-in-the-loop evaluation.
DTSTAMP:20260602T194455Z
LOCATION:Doddington Forum
SUMMARY:From Chat-with-PDF to Quiz-Master: Live-Grading RAG with LLM-as-Jud
 ge in Python - Adam Hill
URL:https://pretalx.com/pydata-london-2026/talk/QQWDVQ/
END:VEVENT
END:VCALENDAR