JuliaCon Local Paris 2025

Algorithmic Differentiation with Mooncake.jl
2025-10-03 , Jean-Baptiste Say Amphitheater
Language: English

Mooncake.jl is an algorithmic differentiation (AD) tool written in Julia. It is characterised by support for a wider range of Julia language features than existing tools, and performance which is typically better than comparable tools written in Julia. It has extensive documentation, simple-to-use tools for correctness testing supported by a precise type system, and is best used via DifferentiationInterface.jl. In this talk I will attempt to both justify and qualify these claims, and will conclude with an opinionated outlook on the future of AD in Julia.


Overview

In order to justify and qualify the claims made in the abstract, this talk will explain the functionality that Mooncake.jl offers to users, and provide a broad-brush explanation of how Mooncake.jl works -- for example what Julia objects it transforms, what it does to them, and what it returns. Below I discuss some more detail around particular points that I shall address.

Language Feature Support

In order to develop the claim regarding Mooncake.jl's breadth of support for the Julia language, I will explain which Julia language features are supported by Mooncake.jl in comparison with Zygote.jl, ReverseDiff.jl, and Enzyme.jl.

Performance

Comparing the performance of different AD systems is a tricky business -- one will always be able to find counterexamples to any claim that one system is faster than another. Therefore, I'll just try to offer some performance comparisons on specific examples, highlighting notable performance implications of the way Mooncake.jl goes about implementing AD. The aim is to build intuition for the kinds of situations in which Mooncake.jl is faster or slower than other systems, and by roughly how much.

Rule System and Robustness

Mooncake.jl has a rule system which appears quite similar to that of ChainRules.jl, but which differs in a couple of crucial ways. From a robustness perspective, the most notable difference is that any tangents / gradients associated to a value of type P must be of type Mooncake.tangent_type(P). That is, each "primal type" is associated to a unique "tangent type". Conversely, ChainRules.jl takes a laissez-faire approach, in which no such restrictions are imposed. I will explain this in more depth, and its implications for users, library authors who wish to ensure that Mooncake.jl can differentiate their code, and maintainers of Mooncake.jl itself.

Despite these differences, it is possible to re-use some rules from ChainRules.jl inside Mooncake.jl. I will briefly highlight how and where this is done.

Testing Tooling

AD tools require thorough testing to have any hope of being correct. Since correctness errors are really the worst kind of errors that an AD tool can make, thorough testing is central to any AD tool. Consequently, testability has been at the core of all design decisions made while developing Mooncake.jl. I will explain how the tools we have implemented for testing are made possible by our design decisions, and how users and library authors can utilise them in their own code.

I am a postdoc at the Alan Turing Institute in London, having previously been a postdoc and PhD student in the Machine Learning Group in Cambridge. I am interested in probabilistic programming, Gaussian processes, algorithmic differentiation, and machine learning for weather forecasting.

I have been a Julia user for a while. I have worked on the algorithmic differentiation ecosystem (Zygote.jl, ChainRules.jl, and Mooncake.jl), the various packages in the JuliaGaussianProcesses organisation. I have also been working closely with the Turing.jl team.