SIMD and cache-aware sorting with ChipSort.jl
2019-07-23, 16:55–17:05, Room 349

ChipSort.jl is a sorting package that exploits instruction-level parallelism and cache memory seeking the best performance in any system.

To attain the best performance with a modern computer, programmers are required to exploit thread and instruction level parallelism and make sure memory access follows suitable patterns. ChipSort.jl is a sorting package that implements techniques exploiting parallelism and memory locality. It uses SIMD instructions to implement basic operations such as sorting networks, merging networks, and in-place matrix transpose. These operations can be used to sort large arrays using merge-sort exploiting SIMD and cache memory for improved performance. The implementation is largely based on unique Julia features such as generated functions and parametric methods, allowing Julia to generate optimized custom machine code for different architectures based on the same high-level Julia code. Experiments were made with both Intel (AVX2 and AVX512) and AMD (NEON) processors, achieving speedups from 2 up to 17 times in different benchmarks.

Project documentation: