PackagingCon

Julia's Pkg – Design & Rationale
2021-11-09 , Room I

The Julia programming language features a built-in package manager commonly referred to as "Pkg". It's actually the third iteration of package manager for the language, code-named Pkg3 while in development. The previous iterations were quite traditional, inspired by Perl's CPAN and RubyGems. Pkg3 is different. This talk explores how it differs from its predecessors and other package managers and what lessons we've learned while developing it and scaling up its usage.


Some salient features of Julia's Pkg that will be covered in this talk:

  • Packages are identified by globally unique UUID, not just name. This allows different packages with the same name to co-exist in the dependency graph of a project. Names used in source code are mapped to UUIDs in a project-local Project.toml file which also contains other project metadata.

  • Code loading works by looking up the cryptographic hash of the source tree of a specific version of a package in a project-local Manifest.toml file. This source hash is used to look up the path where the code should be loaded from. Since each package version is identified and found by tree hash, its content can always be checked for correctness and caches never need to be invalidated. Manifest files can be tracked in version control providing perfect reproducibility by default.

  • It's completely normal for mulitple versions of the same package to be installed at the same time, used by different projects. This is kind of like Python virtual environments but built into the language, with common versions shared, and without requiring any environment variable tricks. Pkg has a gc command that searches through known manifest files and garbage collects (i.e. deletes) any package versions that are no longer in use anywhere.

  • It's not just Julia source packages that are immutable and content-addressed: Pkg also installs libraries and other binary dependencies as immutable, content-addressed tarballs of pre-compiled, system-specific file trees. The right variant for a given operating system / libc version / libc++ version (etc.) is chosen and installed, but that combination is pre-built and simply needs to be downloaded and put in the right place. This makes installing binary dependencies incredibly fast and reliable. It also provides tremendous benefits for reproducibility since all of this is cryptographically hashed, content-addressed, immutable, tracked in project-local version control, and persisted forever by the global network of package servers.

  • Pkg has a federated package registry system. There is a general public registry that Julia clients get their packages from by default, but other registries can be added and used alongside it. It is common for companies and research labs to have their own private and/or public registries of packages. The use of UUIDs to identify packages even makes transitioning a package from private to public extremely smooth. It's even possible for some versions of a package to be public while others—older or newer—remain private.

  • UUIDs provide some protection from dependency confusion attacks, but this depends on UUIDs remaining secret which they are not designed to be. The same features that facilitate migrating a package from private to public inadvertently allow dependency confusion attacks. The General registry allows submission of lists of private UUIDs to block from registration, but this is a stopgap measure at best. Better solutions to this common packaging ecosystems problem are sought.

Co-creator of Julia & co-founder of Julia Computing (https://juliahub.com).