JuliaCon 2023

Sorting gene trees by their path within a species network
07-26, 10:20–10:50 (US/Eastern), Online talks and posters

Evolutionary relationships are often depicted in tree structures where internal nodes represent ancestral species and external nodes represent extant species, but directed networks are often necessary. Many species tree and network estimation procedures first estimate a set of gene trees from sequence data. The proportion of these gene trees that do not necessitate a network model is biologically interesting. In RANSANEC.jl, we expand random sample consensus to tree-space to tackle this problem.

Phylogenetics is the study of evolutionary relationships between organisms or groups. These relationships are often depicted in a tree structure, where internal nodes represent hypothetical ancestral species and external nodes (leaves) represent extant organisms or species. Over the past few decades, though, the importance of biological processes which cannot be represented in a tree structure (such as hybridization, introgression, and more) has become more well-known, which has led to a drastic increase in depicting these relationships with directed networks.

Many common procedures for estimating such species trees and networks begin by estimating a collection of many gene trees from sequence data. These gene trees can be discordant even if a tree model is sufficient for modeling the species relationship, due to a process called incomplete lineage sorting (ILS). Thus, one of the main difficulties with model-based approaches is that it is difficult to disentangle what proportion of gene trees are adequately explained by a tree structure and ILS, and what proportion requires a network structure to be explained.

Our package RANSANEC.jl (RAndom SAmple NEtwork Consensus) implements the classical statistical method of RANSAC (RAndom SAmple Consensus) in phylogenetic tree-space in order to separate the set of estimated gene trees which are adequately explained by ILS and a tree structure and those which require a network structure. This software builds on the rapidly growing suite of phylogenetic analysis software available in Julia.