InvertibleNetworks.jl - Memory efficient deep learning in Julia
2021-07-29, 16:30–17:00 (UTC), Green

We present InvertibleNetworks.jl, an open-source package for invertible neural networks and normalizing flows using memory-efficient backpropagation. InvertibleNetworks.jl uses manually implement gradients to take advantage of the invertibility of building blocks, which allows for scaling to large-scale problem sizes. We present the architecture and features of the library and demonstrate its application to a variety of problems ranging from loop unrolling to uncertainty quantification.


Invertible neural networks (INNs) are designed around bijective building blocks that allow the evaluation of (deep) INNs in both directions, which means that inputs into the network (and all internal states) can be uniquely re-computed from the output. INNs were popularized in the context of normalizing flows as an alternative approach to generative adversarial networks (GANs) and variational auto-encoders (VAEs), but their property of invertibility is also appealing for discriminative models, as INNs allow memory-efficient backpropagation during training. As hidden states can be recomputed for INNs from the output, it is in principle not required to save the state during forward evaluation, thus leading to a significantly lower memory imprint than conventional neural networks. However, existing backpropagation libraries that are used in TensorFlow or PyTorch do not support the concept of invertibility and therefore require work arounds to benefit from them. For this reason, current frameworks for INNs such as FrEIA or MemCNN use layer-wise AD, in which backpropagation is performed by first re-computing the hidden state of the current layer and then using PyTorch's AD tool (Autograd) to compute the gradients for the respective layer. This approach is computationally not efficient, as it performs an additional forward pass during backpropagation.

With InvertibleNetworks.jl, we present an open-source Julia framework (MIT license) with manually implemented gradients, in which we take advantage of the invertibility of building blocks. For each invertible layer, we provide a backpropagation layer that (re-)computes the hidden state and weight updates all at once, thus not requiring an extra (layer-wise) forward evaluation. In addition to gradients, InvertibleNetworks.jl also provides Jacobians for each layer (i.e. forward differentiation), or more precisely, matrix-free implementations of Jacobian-vector products, as well as log-determinants for normalizing flows. While backpropagation and Jacobians are implemented manually, InvertibleNetworks.jl integrates seamlessly with ChainRules.jl, so users do not need to manually define backward passes for implemented networks. Additionally, InvertibleNetworks.jl is compatible with Flux.jl, so that users can create networks that consist of a mix of invertible and non-invertible Flux layers. In this talk, we present the architecture and features of InvertibleNetworks.jl, which includes implementations of common invertible layers from the literature, and show its application to a range of scenarios including loop-unrolled imaging, uncertainty quantification with normalizing flows and large-scale image segmentation.

Philipp A. Witte is a researcher at Microsoft Research for Industry (RFI), a new initiative within Microsoft for developing innovative research solutions for industry-related problems ranging from AI/ML to edge- and high-performance computing. Prior to Microsoft, Philipp received his B.Sc. and M.Sc. in Geophysics from the University of Hamburg and his Ph.D. in Computational Science and Engineering from the Georgia Institute of Technology. During his Ph.D., Philipp worked with Professor Felix J. Herrmann at the Seismic Laboratory for Imaging and Modeling (SLIM) on computational aspects of least squares seismic imaging and full-waveform inversion. He has authored and contributed to multiple open-source software packages, including Devito, the Julia Devito Inversion framework (JUDI) and InvertibleNetworks.jl, a Julia framework for deep learning with normalizing flows.

This speaker also appears in:

Post Docotoral Fellow at Georgia Institute of technology.
My main research focuses on high-performance computing for large-scale PDE constraints optimization (medical imaging, seismic imaging) on standard clusters and in the Cloud. In particular I work intensively on open source solutions in Julia and Python and high-level abstractions for high-performance computing such as Devito (Finite difference DSL) or JUDI.jl (linear algebra abstraction for PDE constraint optimization).
My secondary research project is aimed at computational and algorithmic solutions for large-scale machine learning.

I am pursuing a Ph.D. in Computational Science and Engineering at Georgia Institute of Technology. Currently, my research is mainly focused on applications of deep learning in inverse problems and uncertainty quantification.

Felix J. Herrmann graduated from Delft University of Technology in 1992 and received his Ph.D. in engineering physics from that same institution in 1997. After research positions at Stanford University and the Massachusetts Institute of Technology, he became back in 2002 faculty at the University of British Columbia. In 2017, he joined the Georgia Institute of technology where he is now a Georgia research Alliance Scholar Chair in Energy, cross-appointed between the Schools of Earth & Atmospheric Sciences, Computational Science & Engineering, and Electrical & Computer Engineering. His cross-disciplinary research program spans several areas of computational imaging including seismic, and more recently, medical imaging. Dr. Herrmann is widely known for tackling challenging problems in the imaging sciences by adapting techniques from randomized linear algebra, PDE-constrained and convex optimization, high-performance computing, machine learning, and uncertainty quantification. Over his career, he has been responsible for several cost-saving innovations in industrial time-lapse seismic data acquisition and wave-equation based imaging. In 2019, he toured the world presenting the SEG Distinguished Lecture "Sometimes it pays to be cheap – Compressive time-lapse seismic data acquisition". In 2020, he was the recipient of the SEG Reginald Fessenden Award for his contributions to seismic data acquisition with compressive sensing. At Georgia Tech, he leads the Seismic Laboratory for Imaging and modeling and he is co-founder/director of the Center for Machine Learning for Seismic (ML4Seismic), designed to foster industrial research partnerships to drive innovations in artificial-intelligence assisted seismic imaging, interpretation, analysis, and time-lapse monitoring.

Postdoc at Utrecht University

Bas Peters is visiting assistant professor in the mathematics department at Emory University. Previously, Bas worked for Computational Geosciences Inc as a research scientist, and received his PhD degree from the University of British Columbia in 2019. His main research interests are constrained optimization; design, optimization, and regularization of deep neural networks, geoscientific and geospatial applications, inverse problems, reinforcement learning, image processing, and numerical linear algebra.