Bradley Dice
Bradley Dice is a Senior Software Engineer in GPU-Accelerated Data Analytics at NVIDIA, designing high-performance open-source libraries for data analytics (cuDF) with modern CUDA, C++, and Python.
Session
07-15
14:35
30min
Profiling Python GPU Code
Bryce Adelstein Lelbach, Bradley Dice
Your GPU is fast, so why does your Python code still feel slow? This talk shows a practical, Python-first profiling workflow with Nsight Systems, Nsight Compute, and NVTX for CuPy, Numba, PyTorch, JAX, and CUDA extensions. We will use timelines to find launch overhead, hidden synchronizations, and host-device copies, then drill into kernel bottlenecks like memory throughput and occupancy. You will leave with a repeatable loop for turning profiles into measurable speedups.
General
Johnson Great Room