Easy, Featureful Parallelism with Dagger.jl
2021-07-30 , Blue

Parallelizing codes with Distributed.jl is simple and can provide an appreciable speed-up; but for complicated problems or when scaling to large problem sizes, the APIs are somewhat lacking. Dagger.jl takes parallelism to the next level, with support for GPU execution, fault tolerance, and more. Dagger's scheduler exploits every bit of parallelism it can find, and uses all the resources you can give it. In this talk, I'll build an application with Dagger to highlight what Dagger can do for you!


The Distributed standard library exposes RPC primitives (remotecall) and remote channels for coordinating and executing code on a cluster of Julia processes. When a problem is simple enough, such as a trivial map operation, the provided APIs are enough to get great performance and "pretty good" scaling. However, things change when one wants to use Distributed for something complicated, like a large data pipeline with many inputs and outputs, or a full desktop application. While one could build these programs with Distributed, one would quickly realize that a lot of functionality will need to be built from scratch: application-scale fault tolerance and checkpointing, heterogeneous resource utilization control, and even simple load-balancing. This isn't a fault of Distributed: it just wasn't designed as the be-all-end-all distributed computing library for Julia. If Distributed won't make it easy to build complicated parallel applications, what will?

Dagger.jl takes a different approach: it is a batteries-included distributed computing library, with a variety of useful tools built-in that makes it easy to build complicated applications that can scale to whatever kind and size of resources you have at your disposal. Dagger ships with a built-in heterogeneous scheduler, which can dispatch units of work to CPUs, GPUs, and future accelerators. Dagger has a framework for checkpointing (and restoring) intermediate results, and together with fault tolerance, allows computations to safely fail partway through, and be automatically or manually resumed later. Dagger also has primitives to build dynamic execution graphs across the cluster, so users can easily implement layers on top of Dagger that provide abstractions better matching the problem at hand.

This talk will start with a brief introduction to Dagger: what it is, how it relates to Distributed.jl, and a brief overview of the features available. Then I will take the listeners through the building of a realistic, mildly complicated application with Dagger, showcasing how Dagger makes it easy to make the application scalable, performant, and robust. As each feature of Dagger is used, I will also point out any important caveats or alternative approaches that the listeners should consider when building their own applications with Dagger. I will wrap up the talk by showing the application running at scale, and talk briefly about the future of Dagger and how listeners can help to improve it.

I am an HPC software engineer working at the JuliaLab. I maintain Dagger.jl, AMDGPU.jl, and BPFnative.jl, and generally enjoy the challenge of hacking on compilers and HPC runtimes.

This speaker also appears in: