BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//sips2025-budapest//speaker//BYPWZG
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-sips2025-budapest-LRDSAX@pretalx.com
DTSTART;TZID=CET:20250627T140000
DTEND;TZID=CET:20250627T153000
DESCRIPTION:In this hackathon we will work on a benchmark for AI evaluation
 s of preregistrations. When it comes to the assessment of preregistrations
 \, human labeled data scales poorly due to the need for a lot of expert la
 bor. Instead\, we will focus on synthetically generated preregistrations i
 n which we know the ground truth: what is described adequately and what is
  missing/inadequately described.\n\nPossible project tasks:\nGathering or 
 creating flawless preregistrations.\nFind ways in which important componen
 ts of a preregistration can be broken\nFormulate prompts and AI workflows 
 for generating versions of the flawless preregistrations that are wrong in
  specific ways.\nCreate a database for synthetic preregistrations and grou
 nd truth about which components are adequate and which are not\nWrite code
  that would allow quick benchmarking by AI or Human coders.\nValidate synt
 hetic preregistrations by evaluating adequacy of the synthetically generat
 ed preregistrations.
DTSTAMP:20260514T165839Z
LOCATION:Second floor 213
SUMMARY:UC21: Developing a benchmark for AI reviewers of preregistrations -
  Zoltan Kekecs\, Bence Palfi
URL:https://pretalx.com/sips2025-budapest/talk/LRDSAX/
END:VEVENT
END:VCALENDAR
