Julian P Samaroo
Julian is a Research Software Engineer at MIT's JuliaLab, where he focuses on improving Julia's support for HPC and GPU computing. Julian has previously authored and maintained the AMDGPU.jl package (for programming AMD's GPUs from Julia), and now focuses his efforts on maintaining and developing the Dagger.jl package, to improve the state of productive parallel programming.
Sessions
Stencil operations are a cornerstone in many fields, including fluid and gas flow simulations, machine learning/AI, computer graphics, image processing, and many more. Stencil operations (also known as windowed operations) allow a normal elementwise operation to additionally access neighboring elements, instead of just the currently-selected element. The are a number of stencil computation libraries in Julia, such as ImageFiltering.jl, ParallelStencil.jl, Stencils.jl, and now Dagger.jl (the focus of this talk). Dagger in particular makes it easy to define stencil operations that run across multiple CPUs, multiple GPUs, and across multiple nodes, and supports many kinds of boundary conditions, arbitrary numbers of dimensions, and flexible neighborhood sizing. We will discuss and compare the differences between the various stencil libraries, and see how easy it is to write parallel stencils in each library. We will also look at Dagger’s stencil performance in a variety of microbenchmarks.
Multi-GPU execution is the future - as data sizes grow, and as more work is pushed to the GPU, a single GPU no longer suffices. Unfortunately, programming an algorithm for multi-GPU is more complicated than single-GPU - you now have to deal with the complexity of multi-device data movement and multi-stream synchronization, which puts more burden on the algorithm author and takes away from just writing the algorithm in the simplest, most readable manner. Thankfully, Dagger.jl makes programming multi-GPU algorithms much easier with its Datadeps framework, which lets you focus on writing the algorithm at a high level while Dagger handles the details of managing multiple GPUs.
This talk will explain the problems around multi-GPU programming, and show how Dagger handles them. We will show how the Datadeps framework makes it much easier to write algorithms which naturally support multi-GPU execution, and show the tools that Dagger and Datadeps provide to make algorithm design a breeze.
Round-table open discussion of everything about Dagger.jl. Success or failure stories, gripes and joys, ideas for new features, discussion of existing bugs or missing documentation, and more!