JuliaCon 2026

The GPU acceleration of SpeedyWeather.jl, the friendly and flexible climate model
2026-08-13 , Room 3

Fortran climate models are being adapted to GPUs by automatically translating loop-by-loop into a kernel. In Julia, we have more flexibility to develop the climate model SpeedyWeather.jl for the GPU. Many parts are easy to accelerate, leverage multiple dispatch on the GPU and a high level of kernel fusion for modularity and performance, while being optionally hardware-specific. The spherical harmonic transforms remain a complex bottleneck but we employ a multi-algorithm approach with custom linear algebra kernels using Reactant, Fourier and Legendre transforms.


Climate modelling continues to rely widely on CPUs as large code bases are not adapted to run on GPUs. Yet, climate models require high-performance computing to reach societally relevant resolutions and increased accuracy in weather and climate prediction. Here, we present SpeedyWeather.jl an atmospheric model with dynamic representations of ocean, land and sea ice allowing for global climate simulations. We rely on the Julia-stack for GPU accelerated computing to support Nvidia, AMD and Apple GPUs and report our experience: What works, what does not, what is easy, what is difficult. Our spherical harmonics transform library SpeedyTransforms.jl implements a multi-algorithm approach: Leveraging Fourier and Legendre transforms of varying lengths which remain difficult to scale at low and high resolution. Alternatively, we implemented custom LinearAlgebra kernels for complex-real matrix-matrix multiplies which are easier to optimize using Reactant.jl. Many custom kernels are written for other parts of SpeedyWeather. The so-called parameterizations, the representation of unresolved physical processes such as radiation, precipitation or surface fluxes required further attention: We leverage multiple dispatch on the GPU and a high level of kernel fusion to achieve both flexibility and performance. We employ hardware-specific optimizations with little additional code, for example, changing loop orders between CPU and GPU. The parameterizations contain many different components (one for each physical process) that a SpeedyWeather user would want to compose in many various ways, flexibly switching or modifying them. Our implementation yields both: Flexibility and performance, and new developers can easily write extensions while hiding by default much of the GPU specifics from them. Most parts of SpeedyWeather are easily accelerated by 100x or more on a single GPU compared to single CPU but some bottlenecks remain in the algorithmically complex transforms.

Authors:

Milan Klöwer (1), Maximilian Gelbrecht (2, 3), Niklas Viebig (1, 4)

  1. University of Oxford, UK
  2. Potsdam Institute for Climate Impact Research, Germany
  3. Technical University of Munich, Germany
  4. ETH Zürich, Switzerland

Milan Klöwer is a NERC Independent Research Fellow at the University of Oxford. He did his postdoc at the Massachusetts Institute of Technology (MIT) working on climate model development in Julia. He started SpeedyWeather.jl, a global atmospheric model designed as a research playground to develop prototype ideas on machine-learned representations of climate processes and computationally efficient climate models. He also works on low precision computing, data compression and information theory, predictability of weather and climate, and software engineering.

This speaker also appears in:

Master’s student in Physics at ETH Zurich, currently completing my Master’s thesis in the climate modeling group at AOPP, University of Oxford. Im researching differentiable programming and systematic parameter calibration for Earth system models, with interests in exoplanet climates, high-performance computing, and scientific software engineering.

This speaker also appears in: