BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//pyconde-pydata-2026//speaker//UX7XPD
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-pyconde-pydata-2026-RNT9FV@pretalx.com
DTSTART;TZID=CET:20260415T142000
DTEND;TZID=CET:20260415T145000
DESCRIPTION:CPU–GPU synchronizations are a subtle performance killer in P
 yTorch: they block the host\, prevent the CPU from running ahead\, and cre
 ate GPU idle gaps. This talk explains what host-device synchronization is\
 , how it’s triggered by subtle code patterns (dynamic-shapes)\, and how 
 to diagnose it with NVIDIA Nsight Systems by correlating utilization gaps 
 with long CUDA API calls. We’ll end with practical mitigation patterns\,
  including unit testing for syncs via `torch.cuda.set_sync_debug_mode()` a
 nd when a small Triton kernel can help avoid syncs and fuse ops.
DTSTAMP:20260412T141726Z
LOCATION:Palladium [2nd Floor]
SUMMARY:PyTorch and CPU-GPU Synchronizations - Tomas Ruiz
URL:https://pretalx.com/pyconde-pydata-2026/talk/RNT9FV/
END:VEVENT
END:VCALENDAR