EuroSciPy 2026

Bryce Adelstein Lelbach

Bryce Adelstein Lelbach has spent over a decade developing programming languages, compilers, and libraries. He is passionate about parallel programming and strives to make it more accessible for everyone.

Bryce is a Principal Architect at NVIDIA, where he founded the Core C++ Compute Libraries team and now leads the Vanguard Programming group that drives NVIDIA's roadmap for programming languages, compilers, and core libraries.

He is a leader of the systems programming language community, having served as chair of the C++ Library Evolution and the US programming language standards committee. He has been an organizer and program chair for many conferences over the years. On the C++ committee, he has worked on concurrency primitives, parallel algorithms, senders, and multidimensional arrays.

He previously worked at Lawrence Berkeley National Laboratory and Louisiana State University. He is one of the founding developers of the HPX parallel runtime system.

Outside of work, Bryce is passionate about airplanes and watches. He lives in Midtown Manhattan with his girlfriend and dog.

Your pronouns:

he/him

Affiliation:

NVIDIA

Position / Job:

Principal Engineer

X handle:

blelbach


Sessions

07-20
12:10
20min
Profiling Python GPU Code
Bryce Adelstein Lelbach

Your GPU is fast, so why does your Python code still feel slow? This talk shows a practical, Python-first profiling workflow with Nsight Systems, Nsight Compute, and NVTX for CuPy, Numba, PyTorch, JAX, and CUDA extensions. We will use timelines to find launch overhead, hidden synchronizations, and host-device copies, then drill into kernel bottlenecks like memory throughput and occupancy. You will leave with a repeatable loop for turning profiles into measurable speedups.

Computational Tools and Scientific Python Infrastructure
Room 2.41 (First Floor, Turing)
07-20
15:20
30min
Python Tile Programming for GPUs
Bryce Adelstein Lelbach

Parallel programming can be intimidating, but doesn’t need to be! Tile-based programming models make GPU parallelism more newcomer-friendly, highly productive, and still fast by letting you write sequential, array-centric code while the framework handles parallelization, synchronization, and data movement.

In this example-driven talk, we’ll introduce tile-based programming in Python using NVIDIA’s new stack: cuTile and its compiler foundation, Tile IR. You’ll see recently announced CUDA Tile capabilities in action, including multi-GPU communication, interoperability with traditional CUDA SIMT, and support for more diverse kernels such as convolutions and stencils. We’ll compare tile and SIMT approaches, build intuition for performance and execution, and demonstrate practical debugging and reasoning techniques. Along the way, you’ll see real workloads: HPC stencils, an SPMV plus CG solver, and ML models from TileGym. You’ll leave with a clear sense of when tile programming helps, and how it enables more portable high-performance Python as hardware trends evolve.

Computational Tools and Scientific Python Infrastructure
Room 1.38 (Ground Floor, Turing)
07-23
11:00
90min
GPU Algorithm Authoring with CUDA Tile
Bryce Adelstein Lelbach, Katrina Riehl

Want to write your own GPU algorithms, but not sure how to get started or keep them portable? Come to this hands-on session to learn tile programming with CUDA Tile and cuTile Python: you will build an accurate mental model of tiles and thread groups, write and debug real GPU kernels in a browser-based JupyterLab (no installation), profile and tune performance with NVIDIA Nsight, and see how the same tile code applies across DL and HPC examples like LLM inference and conjugate gradient, including when to use tiles vs SIMT and how to mix both.

Computational Tools and Scientific Python Infrastructure
Room 1.38 (Ground Floor, Turing)