3.6x speedup on A64FX by squeezing ShallowWaters.jl into Float16
07-30, 13:00–13:10 (UTC), Purple

ShallowWaters.jl, a fluid circulation model that was written with a focus on 16-bit arithmetics, runs on A64FX 3.6x faster in Float16 compared to Float64 without a significant model degradation. Calculations were systematically rescaled to fit into the very limited range of Float16 guided by Sherlogs.jl. ShallowWaters.jl shows that 16-bit calculations on A64FX are indeed a competitive way to accelerate Earth-system simulations on available hardware.


Most Earth-system simulations run on conventional CPUs in 64-bit double precision floating-point numbers Float64, although the need for high precision calculations in the presence of large uncertainties has been questioned. The world’s fastest supercomputer, Fugaku, is based on A64FX microprocessors, which also support the 16-bit low precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, a fluid circulation model that was written with a focus on 16-bit arithmetics. It implements techniques that address precision and dynamic range issues in 16 bit. The precision-critical time integration is augmented to include Kahan’s compensated summation to reduce rounding errors. Such a compensated time integration is as precise but faster than mixing 16 and 32-bit of precision. The very limited dynamic range available in Float16 is 6e-5 to 65504, as subnormals are inefficiently supported on A64FX. The bitpattern histogram analysis at runtime with Sherlogs.jl as well as its functionality to record stacktraces conditioned on the occurrence of subnormals were invaluable to limit the arithmetic range. Consequently, we benchmark speed-ups of 3.8x on A64FX with Float16 and 3.6x with compensated time integration to minimize model degradation. Although ShallowWaters.jl is simplified compared to large Earth-system models, it shares essential algorithms and therefore shows that 16-bit calculations on A64FX are indeed a competitive way to accelerate Earth-system simulations on available hardware.

This work used the Isambard UK National Tier-2 HPC Service operated by GW4 and the UK Met Office, and funded by EPSRC.

Co-authors
- Sam Hatfield, European Centre for Medium-Range Weather Forecasts, Reading, UK
- Matteo Croci, Mathematical Institute, University of Oxford, UK
- Peter Düben, European Centre for Medium-Range Weather Forecasts, Reading, UK
- Tim Palmer, University of Oxford, UK

PhD student in Climate Computing
Atmospheric, Oceanic and Planetary Physics
University of Oxford

milan.kloewer@physics.ox.ac.uk
www.milank.de
twitter @milankloewer
github @milankl