2020-07-29 –, Green Track
This talk will present how basic operations on vectors, like summation and dot products, can be made more accurate with respect to Floating-Point arithmetic by using compensated algorithms. The proposed implementation is available in the AccurateArithmetic.jl package, and leverages SIMD instructions in order to achieve high performance on modern hardware architectures.
Computing the dot product of two vectors and, perhaps to a lesser extent, summing the elements of a vector, are two very common basic building blocks for more complex linear algebra algorithms. As such, any change in their performance is likely to affect the overall performance of scientific computing codes; any change in their accuracy is likely to induce a loss of reproducibility in overall computed results. However, both the performance and accuracy of these algorithms is affected by the use of Floating-Point (FP) arithmetic: on the one hand, using smaller FP numbers tends to increase the performance of the computation (through increased memory bandwidth and wider SIMD registers). On the other hand, decreasing the precision of FP numbers also tends to decrease the accuracy if the results.
The work presented in this talk tries to address this issue by efficiently implementing accurate summation and dot product algorithms in Julia. These implementations are available under an open source license in the AccurateArithmetic.jl package, and aim at high performance by leveraging the SIMD capabilities of modern hardware (esp. AVX-2 & AVX-512). Besides naive algorithms, compensated algorithms are implemented: the Kahan-Babuška-Neumaier summation algorithm, and the Ogita-Rump-Oishi simply compensated summation and dot product algorithms. These algorithms effectively double the working precision, producing much more accurate results while incurring little to no overhead, especially for large input vectors.
This talk also tries to build upon this example to make a case for a more widespread use of Julia in the HPC community. Although the vectorization of compensated algorithms is no particularly simple task, Julia makes it relatively easy and straightforward, particularly thanks to existing building blocks in the eco-system such as SIMDPirates.jl. Relying on generic functions and multiple dispatch also allows structuring the code in small, composable building blocks, closely matching textbook algorithms yet efficiently compiled.
François Févotte is co-founder and Chief Scientist of TriScale innov, a start-up dedicated to technical and scientific computing. Prior to that, he graduated in 2008 with a PhD in applied mathematics from the CEA and spent more than ten years with a team dedicated to numerical analysis and modeling in the R&D division of EDF (France's main electric utility). François' approach aims at achieving high performance in scientific software by focusing the effort where it matters most. This includes using state-of-the art numerical techniques in combination with FP-aware algorithms and implementations targeting modern hardware architectures.