Samuel Omlin
Computational Scientist and Responsible for Julia computing, at the Swiss National Supercomputing Centre (CSCS), ETH Zurich
Session
We present a successful approach for building a Reactant backend for ParallelStencil, a Julia package for high-performance stencil computations. The approach includes the generation of kernel code and data structures that are pre-optimized to serve as optimal input for Reactant to generate efficient and correct GPU, TPU, and CPU code. We report performance of representative stencil mini-apps on latest-generation hardware platforms, including NVIDIA H100 GPUs, AMD MI300a GPUs, and Google TPUs, evaluate it in absolute terms, and compare it with performance obtained with established backends that directly generate hardware-specific low-level code.