2021-11-09 –, Room 2
Most package managers need a dependency solver, but dependency solving is an NP-hard problem, and writing a correct solver from scratch is difficult to do correctly, let alone a fast solver. Simply understanding the solution space is a challenge, from simple SAT solvers, to specialized solutions like PubGrub and libsolv, to Satisfiabilty Modulo Theories (SMT) and Answer Set Programming (ASP) solvers. Solvers may need to optimize for multiple objectives -- preferring the most recent versions of dependencies is common, but multi-valued build options, optional dependencies, virtual dependencies, and build options like compilers, architectures, and ABI compatibility can also factor into a solve.
We have recently shipped a new solver in the Spack package manager that relies on the clingo
Answer Set Programming (ASP) framework to accomplish many of these goals. We'll talk about how we handle complex features like optional dependencies, generalized conditions, virtual dependencies (interfaces), compiler selection, ABI options, and multiple optimization criteria in around 500 lines of declarative code. We'll talk about some of the semantics of ASP that lend themselves to very general package solving (vs other models like SMT). Finally, we'll show some performance numbers with large package repositories.
Spack recently gutted its package solver and replcaed it with a very general solver
based on ASP. ASP is a logic programming framework that borrows Prolog's syntax for
first-order logic, but boils it down to SAT underneath. Over the past 25 years or so,
ASP has made great strides in solver performance by borrowing from industrial SAT
solvers and optimization tools, and ASP frameworks are able to solve much larger and
more complex problems than most in the packaging domain.
Spack's dependnecy model is targeted both at configuring from-source builds and at
reusing optimized binary packages, but doing both of these things requires a much more
general solver framework that is offered by most systems. In particular, the type of
decisions handled by systems like PubGrub typically extends only to version selection,
but to enable solvers to deal with build-time parameters requires much deeper package
parameterization. In particular, Spack packages can depend on particular build options,
compilers, compiler flags, and ABI options. The solver will reuse a binary if it meets
particular build criteria, but it will decide instead to build a new version from
scratch if it can't find a suitable binary. Builds can thus be "configured" by the
solver for compatibility with selected binaries. Most package ecosystems and
distributions enforce consistent ABI choices (like compiler and package ABI versions) --
our goal is to avoid these restrictions and allow users to write very general packages
that can more easily build new stacks in exotic environments (like HPC machines).
The talk will give an overview of Spack's dependency model, and we'll show how ASP and
multiple optimization criteria are needed to implement these more general semantics.
We'll compare to other package solvers like PubGrub and show how features of ASP solvers
(like unsatisfiable cores) can be used to construct meaningful error messages even with
complex solves. We'll also show the maintenance benefits of relying on an established
solver framework -- the core logic of the solve can be implemented in around 500 lines
of declarative code, which makes it much more maintainable than a custom,
domain-specific solution.
Todd Gamblin is a Senior Principal MTS in Livermore Computing's Advanced Technology Office at Lawrence Livermore National Laboratory. He created Spack, a popular open source HPC package management tool with a rapidly growing community of contributors. He leads the Packaging Technologies Project in the U.S. Exascale Computing Project, LLNL's DevRAMP project on developer productivity, and an LLNL Strategic Research Initiative on software integration and dependency management. His research interests include dependency management, software engineering, parallel computing, performance measurement, and performance analysis.