BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//euroscipy-2026//speaker//VKG8RE
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-euroscipy-2026-VPYLDF@pretalx.com
DTSTART;TZID=CET:20260720T121000
DTEND;TZID=CET:20260720T123000
DESCRIPTION:Your GPU is fast\, so why does your Python code still feel slow
 ? This talk shows a practical\, Python-first profiling workflow with Nsigh
 t Systems\, Nsight Compute\, and NVTX for CuPy\, Numba\, PyTorch\, JAX\, a
 nd CUDA extensions. We will use timelines to find launch overhead\, hidden
  synchronizations\, and host-device copies\, then drill into kernel bottle
 necks like memory throughput and occupancy. You will leave with a repeatab
 le loop for turning profiles into measurable speedups.
DTSTAMP:20260603T191438Z
LOCATION:Room 2.41 (First Floor\, Turing)
SUMMARY:Profiling Python GPU Code - Bryce Adelstein Lelbach
URL:https://pretalx.com/euroscipy-2026/talk/VPYLDF/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-euroscipy-2026-G3SRPL@pretalx.com
DTSTART;TZID=CET:20260720T152000
DTEND;TZID=CET:20260720T155000
DESCRIPTION:Parallel programming can be intimidating\, but doesn’t need t
 o be! Tile-based programming models make GPU parallelism more newcomer-fri
 endly\, highly productive\, and still fast by letting you write sequential
 \, array-centric code while the framework handles parallelization\, synchr
 onization\, and data movement.\n\nIn this example-driven talk\, we’ll in
 troduce tile-based programming in Python using NVIDIA’s new stack: [cuTi
 le](https://github.com/NVIDIA/cutile-python) and its compiler foundation\,
  [Tile IR](https://github.com/NVIDIA/cuda-tile). You’ll see recently ann
 ounced CUDA Tile capabilities in action\, including multi-GPU communicatio
 n\, interoperability with traditional CUDA SIMT\, and support for more div
 erse kernels such as convolutions and stencils. We’ll compare tile and S
 IMT approaches\, build intuition for performance and execution\, and demon
 strate practical debugging and reasoning techniques. Along the way\, you
 ’ll see real workloads: HPC stencils\, an SPMV plus CG solver\, and ML m
 odels from [TileGym](https://github.com/NVIDIA/TileGym). You’ll leave wi
 th a clear sense of when tile programming helps\, and how it enables more 
 portable high-performance Python as hardware trends evolve.
DTSTAMP:20260603T191438Z
LOCATION:Room 1.38 (Ground Floor\, Turing)
SUMMARY:Python Tile Programming for GPUs - Bryce Adelstein Lelbach
URL:https://pretalx.com/euroscipy-2026/talk/G3SRPL/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-euroscipy-2026-GPHRSK@pretalx.com
DTSTART;TZID=CET:20260723T110000
DTEND;TZID=CET:20260723T123000
DESCRIPTION:Want to write your own GPU algorithms\, but not sure how to get
  started or keep them portable? Come to this hands-on session to learn til
 e programming with CUDA Tile and cuTile Python: you will build an accurate
  mental model of tiles and thread groups\, write and debug real GPU kernel
 s in a browser-based JupyterLab (no installation)\, profile and tune perfo
 rmance with NVIDIA Nsight\, and see how the same tile code applies across 
 DL and HPC examples like LLM inference and conjugate gradient\, including 
 when to use tiles vs SIMT and how to mix both.
DTSTAMP:20260603T191438Z
LOCATION:Room 1.38 (Ground Floor\, Turing)
SUMMARY:GPU Algorithm Authoring with CUDA Tile - Bryce Adelstein Lelbach\, 
 Katrina Riehl
URL:https://pretalx.com/euroscipy-2026/talk/GPHRSK/
END:VEVENT
END:VCALENDAR
