JuliaCon 2023

Accelerating the Migration of Large-Scale Simulation to Julia
07-28, 14:30–15:00 (US/Eastern), 32-124

Julia Accelerator Interfaces(JAI, github.com/grnydawn/AccelInterfaces.jl) tries to solve the issues in code migration from Fortran to Julia GPU by using shared libraries. JAI consists of 1) Julia GPU programming interface using Julia macros whose syntax is similar to OpenACC. 2) Automated shared library generation that implements kernels and vendor API interfaces using vendor-provided compilers. 3) Automated call to functions implemented in the shared libraries using Julia ccall interface.

The emergence of various micro-architectures gives us hope that application programmers can continue to use higher-performant hardwares even in the era without Dennard scaling. We still see advertisements of newer processors such as GPUs that claim that it is a few times faster than its previous generation. However, on the other hand, the divergence of micro-architecture has produced trouble for application programmers. To enjoy the performance from new hardware, they have to “migrate” their code in order to work with the new hardware, which creates two major costs: the cost of code migration and the cost of maintaining multiple versions.

In regards to GPU programming, Julia supports several packages such as CUDA.jl, AMDGPU.jl and oneAPI.jl. Currently, using these packages is a de-facto standard of Julia GPU programming. However, I argue that this approach has demerits seeing from the porting point of view: 1) it does not reduce the cost of code migration and the cost of maintaining multiple versions, 2) it does not support coexistence of multiple GPU programming paradigm such as CUDA and OpenACC, 3) the latest updates in the Julia GPU packages always come next after vendor’s updates.
First, the user has to port the entire code to Julia. Even after completing the porting, the user may need to maintain the old application, possibly written in Fortran or C/C++. There could be many reasons for maintaining both versions such as the eco-system that a community has built around the old application. Secondly, supporting coexistence of multiple GPU programming is especially important now when there is no single winner for GPU programming. Instead of betting on a single GPU programming framework, you may want to transit from one framework to another smoothly when a newer framework is desirable. Lastly, it is obvious that the maintainer of the Julia GPU programming framework should wait until the vendor publicly distributes the latest updates.

Julia Accelerator Interfaces(JAI, https://github.com/grnydawn/AccelInterfaces.jl) tries to solve the three issues identified above by using shared libraries that include GPU kernels and GPU vendor interfaces. JAI consists of three main functions: 1) Julia GPU programming interface using Julia macros. With the macros, Julia programmers can create and run GPU kernels in the way similar to a directive based GPU programming such as OpenACC or OpenMP target. 2) Automated shared library generation that implements kernels and vendor API interfaces. To create the shared library, JAI relies on external vendor-provided compilers instead of Julia internal GPU compilation infra-structure. In this way, JAI can leverage the newest feature of vendor-provided compilers as well as the benefit of Just-in-time (JIT) compilation. 3) Automated call to functions implemented in the shared libraries using Julia ccall interface. Because this boiler-plating works are hidden from JAI user interface, users can write JAI GPU code at high-level abstraction and JAI can accommodate API changes on the vendor-side.

To demonstrate the features of JAI, we ported a Fortran miniWeather (github.com/mrnorman/miniWeather) OpenACC version that simulates weather-like flows for training in parallelizing accelerated HPC architectures to jlweather (https://github.com/grnydawn/jlweather) that utilize an OpenACC-enabled compiler. For the sake of performance comparison, we also ported minWeather to the pure Julia version and to the manually GPU-ported version too. The versions are deployed and executed at two US-based supercomputing centers (https://docs.google.com/presentation/d/17pDiiMnTuy8oscQ9-NmxEZ7SltWU4dhUrD2-51ot6ew/edit?usp=sharing). The experiments show that the jlweather OpenACC version is about 25% faster than Fortran OpenACC version with medium workload while slower with small workload.

In JAI, kernels are not ported to Julia. Instead, the user provides JAI with the body of the kernels in the original languages such as Fortran. And then, JAI generates kernel source files and compiles them to shared libraries. The kernel body is located in an external text file in a simple INI-format file, called KNL file. In a KNL file, multiple versions of kernel-body can co-exist and JAI automatically selects the best version. For example, if a KNL file contains a kernel body in CUDA, HIP, and Fortran version, JAI can first try to select HIP or CUDA version to check if the system supports the framework. If none of them is supported on the system, JAI may select the Fortran version as a backup. This JAI feature for coexisting multiple GPU programming frameworks allows users to migrate code gradually from one kernel to another kernel, not the entire application at once. As of this writing, JAI supports kernel-bodys written in CUDA, HIP, Fortran-OpenACC, Fortran-OpenMP-Target, Fortran, and C/C++.

Youngsung Kim is a software performance engineer at Oak Ridge National Laboratory.

  • Julia Accelerator Interfaces(JAI): embrace conventional languages(Fortran, C, C++) as well as GPU programming(Cuda/Hip, OpenAcc, OpenMP) within Julia.