BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//sips2026-online//talk//G9FTDY
BEGIN:VTIMEZONE
TZID:EST
BEGIN:STANDARD
DTSTART:20001029T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10;UNTIL=20061029T070000Z
TZNAME:EST
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
END:STANDARD
BEGIN:STANDARD
DTSTART:20071104T030000
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:EST
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000402T030000
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=4;UNTIL=20060402T080000Z
TZNAME:EDT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
END:DAYLIGHT
BEGIN:DAYLIGHT
DTSTART:20070311T030000
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:EDT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-sips2026-online-G9FTDY@pretalx.com
DTSTART;TZID=EST:20260507T164000
DTEND;TZID=EST:20260507T164500
DESCRIPTION:As formal peer review is being strained\, AI researchers and de
 velopers have proposed using AI systems to evaluate submissions. For this 
 reason\, I began synthesizing the evidence and asked: How accurately and e
 fficiently can AI models and human-AI teams evaluate study reports relativ
 e to human experts in benchmarking experiments? In a living\, AI-assisted 
 systematic review\, I screened in all high-quality experiments wherein fou
 ndation models and fine-tuned LLMs were evaluated against humans on close-
 ended and open-ended research evaluation tasks. In Phase 1\, I have identi
 fied 38 study reports. To keep up with the swift current of publications\,
  I welcome contributors to join me in writing a protocol and completing Ph
 ase 2 of this systematic review. Our continued efforts will inform us of w
 hat AI models can truly do and offer those who debate their value a shared
  evidence base to reason from and contribute to.
DTSTAMP:20260619T091805Z
LOCATION:Track 1
SUMMARY:oLT20: Review Arena: a Living Synthesis of Experiments Benchmarking
  AI-Assisted Research Evaluation - Jay Patel
URL:https://pretalx.com/sips2026-online/talk/G9FTDY/
END:VEVENT
END:VCALENDAR
