SVDquest: Improving SVDquartets species tree estimation using exact optimization within a constrained search space

Mol Phylogenet Evol. 2018 Jul:124:122-136. doi: 10.1016/j.ympev.2018.03.006. Epub 2018 Mar 9.

Abstract

Species tree estimation from multi-locus datasets is complicated by processes such as incomplete lineage sorting (ILS) that result in different loci having different trees. Summary methods, which estimate species trees by combining gene trees, are popular but their accuracy is impaired by gene tree estimation error. Other approaches have been developed that only use the site patterns to estimate the species tree, and so are not impacted by gene tree estimation issues. In particular, PAUP provides a method in which SVDquartets is used to compute a set Q of quartet trees (i.e., trees on four leaves), and then a heuristic search is used to combine the quartet trees into a species tree T, seeking to maximize the number of quartet trees in Q that agree with T. The PAUP method based on SVDquartets (henceforth referred to as SVDquartets + PAUP) is increasingly used in phylogenomic studies due to its ability to reconstruct species trees without needing to estimate accurate gene trees. We present SVDquest, a new method for constructing species trees using site patterns that is guaranteed to produce species trees that satisfy at least as many quartet trees as SVDquartets + PAUP. We show that SVDquest is competitive with ASTRAL and ASTRID (two leading summary methods) in terms of topological accuracy, and tends to be more accurate than ASTRAL and ASTRID under conditions with relatively high gene tree estimation error. SVDquest is available in open source form at https://github.com/pranjalv123/SVDquest.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computer Simulation
  • Databases, Genetic
  • Genetic Speciation
  • Genomics / methods*
  • Models, Genetic
  • Phylogeny*
  • Species Specificity