BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//scipy-2026//speaker//JCQXWS
BEGIN:VTIMEZONE
TZID:CST
BEGIN:STANDARD
DTSTART:20001029T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10;UNTIL=20061029T080000Z
TZNAME:CST
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
END:STANDARD
BEGIN:STANDARD
DTSTART:20071104T030000
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:CST
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000402T030000
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=4;UNTIL=20060402T090000Z
TZNAME:CDT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
END:DAYLIGHT
BEGIN:DAYLIGHT
DTSTART:20070311T030000
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:CDT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-scipy-2026-VLD7LX@pretalx.com
DTSTART;TZID=CST:20260716T131500
DTEND;TZID=CST:20260716T134500
DESCRIPTION:Imbalanced datasets are common across science and industry: mos
 t screened molecules are inactive and most batted balls in baseball result
  in outs. One standard practice is to downsample the majority class or avo
 id collecting more of it. But majority-class examples are not interchangea
 ble. Some are closely related to other examples\, while others are distinc
 t from any other example in the dataset. Others define the boundary betwee
 n success and failure.\n\nThis talk asks two practical questions:\n1.	How 
 much majority-class data is actually necessary for a performative machine 
 learning model?\n2.	If we cannot collect all of it\, which majority-class 
 examples should we collect?\n\nUsing three wildly different datasets—ant
 ibacterial molecular screening\, sandwich taste ratings\, and Major League
  Baseball at-bat outcomes—I compare random downsampling to strategies th
 at retain harder or more diverse majority-class examples\, and evaluate th
 e impact on generalization and performance for real-world machine learning
  models.
DTSTAMP:20260622T093936Z
LOCATION:Memorial Hall
SUMMARY:Just throw it away? Class imbalance lessons from molecular machine 
 learning to meatballs - Jackie Valeri
URL:https://pretalx.com/scipy-2026/talk/VLD7LX/
END:VEVENT
END:VCALENDAR
