JuliaCon 2020 (times are in UTC)

Loop Analysis in Julia
07-29, 18:00–18:30 (UTC), Green Track

This talk will focus on the library LoopVectorization, providing an overview of how the library represents loops, and how this representation is used alongside cost modeling to pick an efficient vectorization strategy, and how it can be used for defining loops for an autodiff reverse pass.


I will give a brief introduction to loop vectorization in Julia, discussing practical issues such as the benefit of contiguous loads and stores and how they relate to data layout decisions such as arrays of structs versus struct of arrays.

The emphasis of the low level discussion will be the extreme level of parallelism within a single modern CPU core (a single AVX512 core can have up to 128 concurrent double precision floating point operations: 8 Float64 per vector * 2 operations / fma * 2 instructions executed / cycle * 4 cycles latency), emphasizing the need for parallel programming paradigms like SPMD.

LoopVecorization.jl can be thought of as treating loops like a familiar DSL for specifying dependencies between operations (such as arithmetic and loads or stores) and loops, without regard to any order aside from that inherent in the dependency chains.
The library has infrastructure for modeling the cost of evaluating a loop nest using different orders of the constituent loops, and different unrolling and blocking factors of the loops.
The advantage is demonstrated in allowing writing high performance code that is generic with respect to the data layout of the underlying arrays, with the order of evaluated loops and data access pattern shifting in response to transposed arrays without any change in the user's code.

Next, the advantage of the simple representation of loops as dependencies between operations and loops for automatic differentiation is demonstrated.

Statistician at Eli Lilly.

This speaker also appears in: