2020-07-29 –, Green Track
SIMD (Single Instruction, Multiple Data) is a term for when the processor executes the same operation (like addition) on multiple numbers (data) in one instruction. This can give significant speedups. Julia has many ways to take advantage of SIMD, sometimes it happens automatically, as an optimization, but it is also possible to manually write SIMD code.
This talk will give an overview of the different ways you can use SIMD in Julia.
SIMD (Single Instruction, Multiple Data) is a term for when the processor executes the same operation (like addition) on multiple numbers (data) in one instruction. Recent processor architectures come with the capability of running these SIMD instructions on even larger batches of data, making it more important to make sure that SIMD is used when possible, for best performance.
Fortunately, in many cases, Julia can automatically make code use SIMD. Often this comes from optimizations made by LLVM, the code generation library Julia uses. Some cases of this are in:
- Loops, where the LLVM Loop Vectorizer can identify patterns where the loop can be unrolled so that multiple iterations can be done When there is a reduction involved, like when summing the elements of an array, the
@simd
macro might be needed. - Different patterns of scalar operations that can be combined into one SIMD instructions, like when adding two tuples. This is vectorized by the LLVM SLP (Superword-Level Parallelism) vectorizer.
There are however cases where auto-vectorization like above doesn’t happen. This can be when LLVM does not recognize the opportunity to use SIMD or when it isn’t valid to do so because it could change the result slightly. In cases like this, it is possible to:
- Use a “SIMD vector library” like SIMD.jl. This allows one to create a “SIMD Vector” that works similarly to a number but operations on it will work elementwise using SIMD instructions.
- Explicitly call machine instructions specific for a certain CPU architecture. This gives the most control but has the drawback of tying the code the CPU architecture, making it less generic.
This talk will show discuss and show examples of the above SIMD cases, giving insight into how to leverage SIMD for greater performance.
- Contributor to Base and many packages (Pkg.jl, OhMyREPL.jl, PGFPlotsX.jl, TimerOutputs.jl, JuAFEM.jl, NearestNeighbors.jl, etc)
- Release manager for Julia.
- Software engineer at Julia Computing.
- PhD-student in mechanical engineering at Chalmers University of Technology, Sweden.