JuliaCon 2022 (Times are UTC)

HPC sparse linear algebra in Julia with PartitionedArrays.jl
2022-07-28 , Red

PartitionedArrays is a distributed sparse linear algebra engine that allows Julia users to easily prototype and deploy large computations on distributed-memory HPC platforms. The long-term goal is to provide a Julia alternative to the parallel vectors and sparse matrices available in well-known distributed algebra packages such as PETSc. Using PartitionedArrays, application libraries have shown excellent strong and weak scaling results up to tends of thousands of CPU cores.


PartitionedArrays (https://github.com/fverdugo/PartitionedArrays.jl) is a distributed sparse linear algebra engine that allows Julia programmers to easily prototype and deploy large computations on distributed-memory, high performance computing (HPC) platforms. The package provides a data-oriented parallel implementation of vectors and sparse matrices, ready to use in several applications, including (but not limited to) the discretization of partial differential equations (PDEs) with grid-based algorithms such as finite differences, finite volumes, or finite element methods. The long-term goal of this package is to provide a Julia alternative to the parallel vectors and sparse matrices available in well-known distributed algebra packages such as PETSc or Trilinos. It also aims at providing the basic building blocks for the implementation in Julia of other linear algebra algorithms such as distributed sparse linear solvers. We started this project motivated by the fact that using bindings to PETSc or Trilinos for parallel computations in Julia can be cumbersome in many situations. One is forced to use MPI as the parallel execution model and drivers need to be executed non-interactively with commands like mpiexec -n 4 julia input.jl, which posses serious difficulties to the development process. Some typos and bugs can be debugged interactively with a single MPI rank in the Julia REPL, but genuine parallel bugs often need to be debugged non-interactively using mpiexec. In this case, one cannot use development tools such as Revise or Debugger, which is a serious limitation, specially for complex codes that take a lot of time to JIT-compile since one ends up running code in fresh Julia sessions. To overcome these limitations, PartitionedArrays provides a data-oriented parallel execution model that allows one to implement parallel algorithms in a generic way, independently of the underlying message passing software that is eventually used at the production stage. At this moment, the library provides two backends for running the generic parallel algorithms: a sequential backend and an MPI backend. In the former, the parallel data structures are logically parallel from the user perspective, but they are stored in a conventional (sequential) Julia session using standard serial arrays. The sequential back end does not mean to distribute the data in a single part. The data can be split in an arbitrary number of parts, but they are processed one after the other in a standard Julia sequential process. This configuration is specially handy for developing new parallel codes. The sequential backend runs in a standard Julia session and one can use tools like Revise and Debugger, which dramatically improves the developer experience. Once the code works with the sequential backend, it can be automatically deployed in a supercomputer via the MPI backend. In the latter case, the data layout of the distributed vectors and sparse matrices is compatible with the linear solvers provided by libraries like PETSc or MUMPS. This allows one to use these libraries for solving large systems of linear algebraic equations efficiently until competitive Julia alternatives are available. The API of PartitionedArrays allows the programmer to write efficient parallel algorithms since it enables fine control over data exchanges. In particular, asynchronous communication directives are provided, making possible to overlap communication and computation. This is useful, e.g., to efficiently implement the distributed sparse matrix-vector product, where the product on the owned entries can be overlapped with the communication of the off-processor vector components. Application codes using PartitionedArrays such as the parallel finite element library GridapDistributed have shown excellent strong and weak scaling results up to tends of thousands of CPU cores. In the near future, we plan to add hierarchical/multilevel parallel data structures to the library to extend its support to multilevel parallel algorithms such as multigrid, multilevel domain decomposition, and multilevel Montecarlo methods. In this talk, we will provide an overview of the main components of the library and show users how to get started by means of simple examples. PartitionedArrays can be easily installed from the official Julia language package registry and it is distributed with an MPI licence.

Assistant Professor at the Computer Science Department at VU Amsterdam.