Polytomy refinement for the correction of dubious duplications in gene trees

Bioinformatics. 2014 Sep 1;30(17):i519-26. doi: 10.1093/bioinformatics/btu463.

Abstract

Motivation: Large-scale methods for inferring gene trees are error-prone. Correcting gene trees for weakly supported features often results in non-binary trees, i.e. trees with polytomies, thus raising the natural question of refining such polytomies into binary trees. A feature pointing toward potential errors in gene trees are duplications that are not supported by the presence of multiple gene copies.

Results: We introduce the problem of refining polytomies in a gene tree while minimizing the number of created non-apparent duplications in the resulting tree. We show that this problem can be described as a graph-theoretical optimization problem. We provide a bounded heuristic with guaranteed optimality for well-characterized instances. We apply our algorithm to a set of ray-finned fish gene trees from the Ensembl database to illustrate its ability to correct dubious duplications.

Availability and implementation: The C++ source code for the algorithms and simulations described in the article are available at http://www-ens.iro.umontreal.ca/~lafonman/software.php.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Fishes / genetics
  • Genes*
  • Phylogeny*