Phylogenetic reconstruction and comparative analysis are fundamental to understanding evolutionary relationships and biological diversity. Traditional algorithms rely heavily on multiple sequence alignments and statistical modelling, which face significant computational challenges with large-scale datasets. Furthermore, integrating information from multiple data sources, such as sequences, structures, and functional annotations, at the time of reconstruction, remains technically challenging, limiting the feasibility of phylogenetic reconstruction using today’s diversity of biological annotations.
Hyperdimensional computing (HDC) is a novel computational paradigm that employs high-dimensional representations of atomic entities (e.g., amino acids) and combines them via algebraic operations to represent more complex data structures (e.g., proteins). This paradigm, parallel to connectionist modelling, is characterised by modelling the brain's distributed memory and the operations underlying its processing. HDC exhibits several properties advantageous for biological data analysis: robustness to noise, holographic information distribution, and the ability to integrate heterogeneous data sources seamlessly. Recent applications in DNA sequencing, pattern matching, and molecular classification have demonstrated HDC's potential in bioinformatics, where its computational efficiency, interpretability, and natural capacity for multimodal data fusion make it particularly well suited to complex phylogenetic analyses.
In this talk, we showcase the potential of HDC for phylogenetic reconstruction and comparative analysis. Here, we present PhyloHD.jl, a Julia package for representing biological data as hypervectors and reconstructing phylogenetic trees from these representations. We will showcase how to calculate branch support using the HDC paradigm and present a multimodal tree reconstruction approach that integrates multiple heterogeneous data sources, including sequences, structures, and functional annotations. Finally, we will showcase how HDC learning techniques can be used for family-based phylogenetic tree reconstruction and ancestral sequence reconstruction. This work represents the first attempt to use hyperdimensional computing as a computational paradigm for phylogenetics and opens new avenues for research in this field.