JuliaCon 2026

Implementing Lattice QCD to Multi-GPU Systems with JuliaQCD
2026-08-12 , Room 5

Lattice QCD simulations are restricted by both massive computational costs and high memory requirements. To simulate large physical volumes, distributing the lattice across multiple GPUs is essential. We present JuliaQCD, a native Julia ecosystem designed for computations on HPCs. By integrating JACC.jl for vendor-neutral GPU kernels and MPI.jl for lattice decomposition, we enable simulations on lattices that exceed the memory capacity of a single device while maintaining high performance.


Background:

Lattice Quantum Chromodynamics (LQCD), which stands as a foundational framework enabling first-principle non-perturbative study for the strong force in nature, is a grand challenge computational problem in high-energy physics. Due to the 4-dimensional nature of the spacetime grid, memory requirements grow with the lattice extent. Often, a single high-end GPU does not have enough VRAM to store the necessary gauge and fermion fields for physically relevant volumes. Therefore, multi-GPU parallelism in JuliaQCD is not only an optimization for speed but also a fundamental requirement for large-volume physics simulations.

Technical Content:

  • Overcoming the Memory Wall: We discuss how JuliaQCD handles 4D lattice decomposition across multiple GPU nodes. This allows us to simulate larger volumes by pooling the memory of an entire cluster.

  • Performance Portability with JACC.jl: We highlight the use of JACC.jl to write kernels that are portable across NVIDIA, AMD, and other hardware. This ensures that JuliaQCD remains flexible as HPC centers transition to diverse GPU architectures.

  • Communication Overheads: We detail our implementation of halo (ghost zone) exchanges using MPI.jl. We explain how we minimize the latency penalties associated with moving data between GPUs to maintain efficiency as the number of nodes increases.

The main goal of this talk is to demonstrate that the JuliaQCD suite provides a scalable, memory-efficient framework for modern lattice simulations. We aim to show that Julia is fully capable of handling the memory distribution of large-scale theoretical physics with multiple GPUs on various HPC architectures.

I am a Postdoctoral Researcher at the Center for Computational Sciences (CCS), University of Tsukuba, Japan. My research focuses on Lattice Field Theory, aiming to understand the strong force, one of the fundamental interactions in nature. To achieve such research goals, I develop numerical tools that leverage HPC across diverse architectures, including both CPUs and GPUs. Recently, I have also been interested in applying Machine Learning to accelerate lattice calculations.