BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//pyconde-pydata-2026//talk//M33SNJ
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-pyconde-pydata-2026-M33SNJ@pretalx.com
DTSTART;TZID=CET:20260415T150000
DTEND;TZID=CET:20260415T154500
DESCRIPTION:Synthetic data is often presented as an easy fix for missing or
  sensitive datasets\, but in practice\, it can silently introduce bias\, l
 eakage\, and misleading evaluation results. This talk presents a practical
 \, end-to-end pipeline for creating synthetic datasets that are reproducib
 le\, task-aligned\, and bias-aware. We will walk through design decisions 
 that matter: template-based generation vs. free-form generation\, entity b
 alancing\, controlling distributional skew\, filtering failure cases\, and
  validating dataset quality before training any model. The session emphasi
 zes what actually works in real pipelines\, common failure modes that look
  fine at first glance\, and concrete best practices for Python developers 
 to apply when building synthetic datasets for machine learning\, NLP\, or 
 evaluation.
DTSTAMP:20260523T180013Z
LOCATION:Helium [3rd Floor]
SUMMARY:Building Non-Biased Synthetic Datasets: What Actually Works (and Wh
 at Fails) - Shiva Banasaz Nouri
URL:https://pretalx.com/pyconde-pydata-2026/talk/M33SNJ/
END:VEVENT
END:VCALENDAR
