PyData London 2026

GPU Algorithm Authoring with CUDA Tile
2026-06-05 , Grand Hall 1

Want to write your own GPU algorithms, but not sure how to get started or keep them portable? Come to this hands-on session to learn tile programming with CUDA Tile and cuTile Python: you will build an accurate mental model of tiles and thread groups, write and debug real GPU kernels in a browser-based JupyterLab (no installation), profile and tune performance with NVIDIA Nsight, and see how the same tile code applies across DL and HPC examples like LLM inference and conjugate gradient, including when to use tiles vs SIMT and how to mix both.


CUDA Tile is NVIDIA's new programming model for writing GPU kernels in an array-centric style that is portable across NVIDIA GPU architectures. Instead of orchestrating thousands of threads directly, you express computation over small local arrays (tiles) and let the system manage the parallel execution details: synchronization, data movement, and coordination across the GPU.

This interactive session introduces the core mental model behind tile programming and how it is realized in cuTile Python on top of the Tile IR compiler stack. You will write tile code, see how it maps onto real GPU execution, and learn how to evaluate and tune performance with NVIDIA's Nsight profilers. We'll explore examples from both DL and HPC, such as large language model inference and conjugate gradient solvers.

This session is hands-on with no installation required, just a web browser. We'll use Brev, NVIDIA's developer cloud, to get access to GPUs, and all work will be done in a JupyterLab environment.

By the end of this session, you will:
- Build an accurate mental model of tiles, thread groups, and how tile code executes on GPUs.
- Write and debug tile-based GPU kernels in Python for real workloads.
- Use profiling traces to identify bottlenecks and guide optimizations inside a notebook workflow.
- Decide when tile programming is the right tool versus SIMT, and how to mix the two when needed.

Links:
- Accelerated Computing Hub: https://github.com/NVIDIA/accelerated-computing-hub
- cuTile Python: https://github.com/NVIDIA/cutile-python
- Tile IR: https://github.com/NVIDIA/cuda-tile
- TileGym examples: https://github.com/NVIDIA/TileGym

Dr. Katrina Riehl is a Principal Technical Product Manager at NVIDIA leading the CUDA Education program. For over two decades, Katrina has worked extensively in the fields of scientific computing, machine learning, data science, and visualization. Most notably, she has helped lead data initiatives at the University of Texas Austin Applied Research Laboratory, Anaconda, Apple, Expedia Group, Cloudflare, and Snowflake. She is an active volunteer in the Python open-source scientific software community and currently serves on the Advisory Council for NumFOCUS.