JuliaCon 2022 (Times are UTC)

Improvements in package precompilation
07-28, 12:30–13:00 (UTC), Blue

Julia code can be precompiled to save time loading and/or compiling it on first execution. Precompilation is nuanced because Julia code comes in many flavors, including source text, lowered code, type-inferred code, and various stages of optimization to reach native machine code. We will summarize what has (and hasn't) previously been precompiled, some of the challenges posed by Julia's dynamism, the nature of some recent changes, and prospects for near-term extensions to precompilation.

Package precompilation occurs when you first use a package or as triggered by changes to your package environment. The goal of precompilation is to re-use work that would otherwise have to be repeated each time you load a raw source file; potentially-saved work includes parsing the source text, type-inference, optimization, generation of LLVM IR, and/or compilation of native code. While there are many cases in computing where a previously-calculated result can be recomputed faster than it can be retrieved from storage, code compilation is not (yet) one such case. Indeed, the time needed for compilation is the dominant contribution to Julia's latency, the delay you experience when you first execute a task in a fresh session. In an effort to reduce this latency, Julia has long supported certain forms of precompilation.

Package precompilation occurs in a clean environment with just the package dependencies pre-loaded, and the results are written to disk (serialization). When loaded (deserialization), the results have to be "spliced in" to a running Julia session which may include a great deal of external code. Several of the most-loved features of Julia---its method polymorphism, aggressive type specialization, and support for dynamic code development allowing redefinition and/or changes in dispatch priority---conspire to make precompilation a significant challenge. Some examples include saving type-specialized code (which types should be precompiled?), code that may be valid in one environment but invalid in another (due to redefinition or having been superseded in dispatch priority), and code that needs to be compiled for types defined in external packages. While lowered code is essentially a direct translation of the raw source text, saving any later form of code requires additional information, specifically the types that methods should be specialized for. This information can be provided manually through explicit precompile directives, or indirectly from the state of a session that includes all necessary and/or useful specializations.

Julia versions prior to 1.8 provide exhaustive support for precompiling lowered code (allowing re-use of the results of parsing). A subset of the results of type-inference could also be precompiled, but in practice much type-inferred code was excluded: it was not possible to save the results of type-inference for any method defined in a different package. That meant it was not possible to save the results of type-inference for new type-specializations of externally-defined methods. Finally, native code was not possible to precompile except by generating a custom "system image" using a tool like PackageCompiler.

Julia 1.8 introduced the ability to save the results of type-inference for external methods, and thus provides exhaustive support for saving type-inferred code. As a result, packages generally seem to exhibit lower time-to-first task, with the magnitude of the savings varying considerably depending on the relative contributions of inference and native-code generation to experienced latency.

To go beyond these advances, we have begun to build support for serialization and deserialization of native code at package level. Native code would still be stored package-by-package (supporting Julia's famous composability), and this requires the ability to link this code after loading. Different from static languages like C and C++, this linking must be compatible with Julia's dynamic features like late specialization and code-invalidation. We will describe the progress made so far and the steps needed to bring this vision to fruition.

Tim Holy is the Alan A. and Edith L. Wolff Professor of Neuroscience at Washington University in St. Louis. Valentin Churavy is a Ph.D. student in MIT's Computer Science & Artificial Intelligence Laboratory. Both are long-time contributors to Julia and its package ecosystem.