2024-07-10 –, Method (1.5)
We present ThreadedDenseSparseMul.jl
, a library that efficiently computes dense-sparse multiplications and outperform competing packages (and Base.SparseArrays
) in about 20 lines of code (for the basic functionality) by leveraging Polyester.jl
. We further discuss the effect of Julia's memory layout on the performance and analyze the influence of different threading models.
Several Julia packages have been implemented to deal with multi-threaded multiplication of sparse and dense matrices, including
IntelMKL.jl
ThreadedSparseCSR.jl
ThreadedSparseArrays.jl
SparseArrays.jl
(no threading)
However, typically these libraries focus on sparse-times-dense multiplications, which are sub-optimal due to Julia's row-major memory layout.
We present theory and benchmarks how we can outperform these libraries, as well as existing C/C++ libraries, by 2 to 16x specifically for the dense-times-sparse case by focusing on optimizing cache-locality and Polyester.jl
for a low-overhead threading model.
We will also provide additional comments on the influence of the CPU architecture (cache size etc) on the theoretical and practical performance.
- PhD Student at the Stanford Intelligent Systems Lab (SISL), developing certification guidelines for employing data-driven algorithms in safety-critical applications.