Learning to align with differentiable dynamic programming
2021-07-28 , Green

The alignment of two or more biological sequences is one of the main workhorses in bioinformatics because it can quantify similarity and reveal conserved patterns. We provided a differential version of the two most popular algorithms for sequence alignment: the Needleman–Wunsch and Smith-Waterman algorithms. Using ChainRulesCore.jl, the gradients can be used directly in combination with bioinformatics and machine learning libraries.


The alignment of two or more biological sequences is one of the main workhorses in bioinformatics because it can quantify similarity and reveal conserved patterns. Dynamic programming allows for rapidly computing the optimal alignment between two sequences by recursively splitting the problem into smaller tractable choices, i.e., deciding whether it is best to extend a current alignment or introduce a gap in one of the sequences. This process leads to the optimal alignment score and backtracking yields the optimal alignment. By departing from a collection of pairwise alignments, one can heuristically compute a multiple sequence alignment of many sequences. If one is interested in the effect of a small change in the alignment parameter or the sequences, one has to compute the alignment score gradient with respect to these inputs. Regrettably, computing this gradient is not possible because the individual maximisation (minimisation) steps in the dynamic programming are non-differentiable.

However, Mensch and Blondel recently showed that by smoothing the maximum operator, for example, by regularising with an entropic term, one can design fully differentiable dynamic programming algorithms. The individual smoothed maximum operators have various desirable properties, such as being efficient to compute, sparsity, or probabilistic interpretation. Departing from this work, we created a differentiable version of the Needleman–Wunsch and Smith-Waterman algorithm. Using ChainRulesCore.jl, we allowed this gradient to be compatible with Julia's autodiff ecosystem.

The resulting gradient has an immediate diagnostic and statistical interpretation, such as computing the Fisher information to create uncertainty estimates. Furthermore, it enables us to use sequence alignment in differentiable computing, allowing one to learn an optimal substitution matrix and gap cost from a set of homologous sequences. The flexibility allows these parameters to vary at different regions in the sequences, for example, depending on the secondary structure. One can also change this around and fix the alignment parameters and optimise the sequences for alignment. This scheme allows for finding consensus sequences, which can be useful in creating a multiple sequence alignment. More broadly, our algorithm can be incorporated in arbitrary artificial neural network architectures (using e.g. Flux.jl), making it an attractive alternative to the popular convolution neural networks, LSTMs or transformer networks currently used to learn from biological sequences.

I am a postdoctoral researcher at Ghent University. My interest is in using computational intelligence to understand and design biological systems.