2023-07-26 –, Online talks and posters
We demonstrate how to use NCI services to run and profile Julia code at scale.
NCI Overview
The National Computational Infrastructure (NCI) is Australia's leading organization for high-performance data, storage, and computing.
As the home of the high-performance computational science for Australia research, our highly integrated scientific computing facility provides world-class services to thousands of Australian researchers and their collaborators every year. They consume 1.2 billion CPU hours annually and access our curated dataset from more than 100 countries worldwide.
Julia Support at NCI
Our user community has a very broad range of research interests, ranging from astronomy to bioinformatics, material science to particle physics, and climate simulations to geostorage. To accommodate this diverse demand, NCI maintains a wide selection of software applications, with the number of accumulated software versions exceeding 1000 just three years after the initial launch.
We started supporting Julia at NCI in 2018 on Raijin, which was the predecessor to Gadi. Now, more singularity modules containing Julia are available for users. These modules ship with over 500 popular packages to help users to quickly start their computations and try different workflows on many cores/nodes without worrying about the installation.
Users can run their Julia code in Jupyter notebooks using the prepared module through the Australian Research Environment (ARE). The ARE environment provides an interface to all types of compute architectures available on our HPC system, Gadi. For example, users can run their Julia code on GPUs with CUDA support across multiple nodes using either native SSHManager or MPIManager, while monitoring the GPU performance.
We are going to show three examples in this talk. First, Oceananigans ShallowWaterModel using two GPUs in a JupyterLab instance while monitoring the GPU usage. Second, the training of game connect-four using AlphaZero.jl across two GPU nodes and profiling the code using NVIDIA Nsight systems. Third, AEM inversion using HiQGA.jl on 7104 CPU cores and profiling the code using PPerf.jl and Intel-VTune.
Staff Scientist at NCI. PhD of Physics at ANU.