BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//pydata-london-2026//talk//HC3SLQ
BEGIN:VTIMEZONE
TZID:GMT
BEGIN:STANDARD
DTSTART:20001029T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:GMT
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T020000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:BST
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-pydata-london-2026-HC3SLQ@pretalx.com
DTSTART;TZID=GMT:20260605T160000
DTEND;TZID=GMT:20260605T173000
DESCRIPTION:Production AI systems improve through a data flywheel: teams cr
 eate training examples from curated source material\, those examples shape
  model behavior\, production usage reveals what the model still needs\, an
 d those usage signals become the next round of improvement. This hands-on 
 tutorial focuses on the data pipelines behind that flywheel: how to genera
 te\, validate\, and anonymize training data without relying on one-off pro
 mpt scripts.\n\nParticipants will build a reproducible training-data pipel
 ine using NVIDIA NeMo Data Designer and NeMo Anonymizer. We'll start by wo
 rking through text-based examples that introduce the basics of Data Design
 er: defining the shape of a dataset\, connecting generation to source reco
 rds\, creating structured outputs\, and filtering generated rows with judg
 e-based quality checks. Then we'll extend the same pattern to multimodal d
 ocument understanding with image-based invoice data and VLM-generated visu
 al QA examples.\n\nFinally\, we'll shift from workshop-generated data to p
 roduction usage data from a fine-tuned model. Using Anonymizer\, participa
 nts will detect and transform sensitive fields so usage logs can safely be
 come source material for the next training iteration.\n\nBy the end\, part
 icipants will understand a practical pattern for multimodal training data 
 with privacy-safe feedback: **source data -> generate -> validate -> anony
 mize feedback -> improve**.
DTSTAMP:20260602T223156Z
LOCATION:Doddington Forum
SUMMARY:From Synthetic Examples to Production Signals: Multimodal Training 
 Data Pipelines with Privacy-Safe Feedback - Nabin Mulepati\, Lipika Ramasw
 amy
URL:https://pretalx.com/pydata-london-2026/talk/HC3SLQ/
END:VEVENT
END:VCALENDAR
