BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//pydata-london-2026//talk//CHQLFU
BEGIN:VTIMEZONE
TZID:GMT
BEGIN:STANDARD
DTSTART:20001029T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:GMT
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T020000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:BST
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-pydata-london-2026-CHQLFU@pretalx.com
DTSTART;TZID=GMT:20260607T161500
DTEND;TZID=GMT:20260607T170000
DESCRIPTION:Learn practical techniques for using LLMs to solve the data sca
 rcity problem that plagues real-world ML projects. This talk demonstrates 
 three production-ready approaches: synthetic generation\, LoRA fine-tuning
 \, and LLM-powered annotation to augment training datasets when you have a
 bundant data for common cases but almost nothing for edge cases or emergin
 g categories. Using a food review classification scenario\, you'll see how
  to generate high-quality training data\, when each technique works best\,
  and critically\, how to validate synthetic data to avoid amplifying error
 s. Perfect for practitioners facing the "we have 10k examples of X but zer
 o for Y" problem.\n\nTarget Audience: Data scientists and ML engineers wor
 king on classification\, NLP\, or content moderation tasks who struggle wi
 th imbalanced or incomplete training datasets.\n\nTakeaway: A decision fra
 mework for choosing between synthetic generation\, fine-tuning\, and LLM a
 nnotation\, plus validation strategies to ensure data quality before retra
 ining models.
DTSTAMP:20260602T223155Z
LOCATION:Doddington Forum
SUMMARY:When Your Dataset Has Blind Spots: Practical LLM-Based Data Augment
 ation - Ophelie Bleu
URL:https://pretalx.com/pydata-london-2026/talk/CHQLFU/
END:VEVENT
END:VCALENDAR
