Juliacon 2024

ThreadedDenseSparseMul.jl: Multi-threaded Dense-Sparse Matmul.
2024-07-10 , Method (1.5)

We present ThreadedDenseSparseMul.jl, a library that efficiently computes dense-sparse multiplications and outperform competing packages (and Base.SparseArrays) in about 20 lines of code (for the basic functionality) by leveraging Polyester.jl. We further discuss the effect of Julia's memory layout on the performance and analyze the influence of different threading models.


Several Julia packages have been implemented to deal with multi-threaded multiplication of sparse and dense matrices, including

  • IntelMKL.jl
  • ThreadedSparseCSR.jl
  • ThreadedSparseArrays.jl
  • SparseArrays.jl (no threading)

However, typically these libraries focus on sparse-times-dense multiplications, which are sub-optimal due to Julia's row-major memory layout.
We present theory and benchmarks how we can outperform these libraries, as well as existing C/C++ libraries, by 2 to 16x specifically for the dense-times-sparse case by focusing on optimizing cache-locality and Polyester.jl for a low-overhead threading model.
We will also provide additional comments on the influence of the CPU architecture (cache size etc) on the theoretical and practical performance.

See also: The poster 😎 (439.3 KB)
  • PhD Student at the Stanford Intelligent Systems Lab (SISL), developing certification guidelines for employing data-driven algorithms in safety-critical applications.
This speaker also appears in: