Tomas Ruiz
I am a research assistant at the Ludwig-Maximilian-University of Munich within Prof. Schwemmer’s Computational Social Science Lab. My research area is the intersection of Machine Learning and Social Media, particularly on multi-modal understanding. In previous jobs, I have worked as a software engineer in different corporations (Amazon, Allianz, BMW) and Startups. The projects ranged from optimization algorithms to backend-engineering.
Session
CPU–GPU synchronizations are a subtle performance killer in PyTorch: they block the host, prevent the CPU from running ahead, and create GPU idle gaps. This talk explains what host-device synchronization is, how it’s triggered by subtle code patterns (dynamic-shapes), and how to diagnose it with NVIDIA Nsight Systems by correlating utilization gaps with long CUDA API calls. We’ll end with practical mitigation patterns, including unit testing for syncs via torch.cuda.set_sync_debug_mode() and when a small Triton kernel can help avoid syncs and fuse ops.