2023-07-26 –, 32-124
Phylogenetic networks represent the evolutionary process of reticulate organisms by the explicit modeling of gene flow. While most existing network methods are not scalable to tackle big data, we introduce a novel method to reconstruct phylogenetic networks based on algebraic invariants without the heuristic search of network space. Our methodology is available in the Julia package phylo-diamond.jl, and it is at least 10 times faster than the fastest-to-date network methods.
The abundance of gene flow in the Tree of Life challenges the notion that evolution can be represented with a fully bifurcating process, as this process cannot capture important biological realities like hybridization, introgression, or horizontal gene transfer. Coalescent-based network methods are increasingly popular, yet not scalable for big data, because they need to perform a heuristic search in the space of networks as well as numerical optimization that can be NP-hard. Here, we introduce a novel method to reconstruct phylogenetic networks based on algebraic invariants. While there is a long tradition of using algebraic invariants in phylogenetics, our work is the first to define phylogenetic invariants on concordance factors (frequencies of 4-taxon splits in the input gene trees) to identify level-1 phylogenetic networks under the multispecies coalescent model. Our novel inference methodology is optimization-free as it only requires evaluation of polynomial equations, and as such, it bypasses the traversal of network space yielding a computational speed at least 10 times faster than the fastest-to-date network methods. We illustrate the accuracy and speed of our new method on a variety of simulated scenarios as well as in the estimation of a phylogenetic network for the genus Canis. We implement our novel theory on an open-source publicly available Julia package phylo-diamond.jl with broad applicability within the evolutionary biology community.
Zhaoxing Wu just completed her undergraduate studies at the University of Wisconsin-Madison, where she worked with Prof. Claudia Solis-Lemus and major in statistics and mathematics. Her research interest broadly lies in Phylogenetics and network analysis.