Writing a GenericArpack library in Julia. JuliaCon 2022 (Times are UTC)

Writing a GenericArpack library in Julia.
.ical

2022-07-28 17:10–17:40, Red

Arpack is a library for computing eigenvalues and eigenvectors of a linear operator. It has been used in many technical computing packages. The goal of the GenericArpack.jl package is to create a Julia translation of Arpack. Right now, the Julia GenericArpack.jl methods produce bitwise identical results to the Arpack_jll methods for Float64 types in all testcases. The new library has zero dependency on BLAS and supports element types beyond those in Arpack, such as DoubleFloats.jl.

Summary of key points

Arpack is a library for iteratively computing eigenvalues and eigenvectors of a linear operator. It has been widely used, debugged, and implemented in many technical computing packages. The goal of the GenericArpack.jl package is to create a Julia translation of the Arpack methods. (Currently, only the symmetric solver has been translated.)

The new library has zero dependency on system level BLAS, allowing it to support matrix element types beyond those in Arpack, such as those in DoubleFloats.jl and MultiFloats.jl. Other advantages of the GenericArpack.jl package include thread safety and using Julia features to allow one to optionally avoid the reverse communication interface. Despite not using the system BLAS, the goal was to make the Julia output equivalent to an alternative compilation of the Arpack source code for Float64 types.

An anticipated future use is using the GenericArpack.jl package to give WebASM Julia implementations tools for iterative eigensolvers. This would enables a wide variety of in-browser analysis including finite element, pseudospectra, and spectral graph theory.

Talk overview

The talk will discuss some interesting challenges that arose:

representing state information in Julia for the statically located Fortran save variables in Arpack.
sensitivity to the norm operation and implementing an exact replacement for the OpenBLAS dnrm2 on x86 architectures that uses 80-bit floating point features of x86 CPUs (without calling the OpenBLAS function)
getting the same random initialization vectors as Arpack (i.e. porting the Fortran dlarnv function)
designing an interface that allows us to compare results between Arpack_jll and GenericArpack.jl at internal methods in Arpack call chains.
how much code it took to translate the symmetric eigensolver to a Hermitian eigensolver (which is not in Arpack)

It will also discuss some tools created that may be useful elsewhere

a tridiagonal eigensolver that only computes a single row of the eigenvector matrix (the Arpack dstqrb function)
allocation analysis that automatically runs, parses output, and cleans up after a track-allocations run of Julia
Julia implementations of a few LAPACK/BLAS functions and the details needed to match bitwise match OpenBLAS calls (on MacOS)

Initial rough talk slide ideas

teaser: the world's most precise estimate of the largest few singular values of the netflix ratings matrix. (100M non-zeros) ... or something similar.
reveal: the code... using GenericArpack; svds(...)
pitch: A dropin replacement for Arpack.jl (for symmetric problems).
what is Arpack and why is it important?
Arpack and reverse communication.
summary of project goals: why translation, same input/same output and not something else (new algorithms, etc.), also why minimal dependencies.
basic translation approach: an exercise in @view / sub-arrays.
getting bitwise identical output -- the Lanczos/Arnoldi information seems close, but somewhat different from Arpack
- key issue: well, turns out this is very sensitive to the norm function.
- real problem: OpenBlas norm uses 80-bit FP operations. (And why they can get away with sqrt(sum(x.^2)) and you can't!)
- solution 1: use double-double to simulate! (but it's slow)
- solution 2: just use ideas from BitFloats.jl and llvm intrinsics instead
So, I've written everything, it passes tests, etc. Why does it use so many allocations? (When it should use zero, like the Fortran code!)
tools for hunting down allocations. (well, really just parsing track-allocation output)
A curiosity: why does the line while true allocate?
I wish there was a "strict" mode that doesn't allow quite so much flexibility.
because we can: from Arpack ido (really what you the user do!) to Julia idonow to avoid reverse communication.
because we can: going from symmetric real-valued Arpack methods to Hermitian complex-valued methods (which do not exist in Arpack)
because someone will ask: comparing performance. This will show the current state of performance. At the moment, for a problem Arpack solves in 15ms, GenericArpack.jl takes 23ms; although there has been only minor performance tuning.
A list of future work. Portion the non-Hermitian complex valued case; an "AbstractEigenspace.jl" package that multiple people could implement; Handling differences.
The vision: Why this would be super useful. Iterative Eigenvalues in the browser for Pluto.jl running via WebASM... for really cool demos akin to pseudospectra... for finite elements in the browser ... for interactive spectral graph analysis in the browser... for mixed precision Arpack computations (Lanczos/Arnoldi info in high-precision, vectors in low-precision).

David Gleich

David Gleich is an associate professor of computer science at Purdue University.

Writing a GenericArpack library in Julia. .ical 2022-07-28 17:10–17:40, Red

Summary of key points

Talk overview

Initial rough talk slide ideas

Writing a GenericArpack library in Julia.
.ical

2022-07-28 17:10–17:40, Red