Format

Send to

Choose Destination
Proc Natl Acad Sci U S A. 2015 Feb 17;112(7):2058-63. doi: 10.1073/pnas.1412770112. Epub 2015 Feb 2.

Phylogenomics with paralogs.

Author information

1
Center for Bioinformatics, Saarland University, D-66041 Saarbrücken, Germany; mhellmuth@bioinf.uni-sb.de.
2
Parallel Computing and Complex Systems Group, Department of Computer Science, Leipzig University, D-04109 Leipzig, Germany;
3
Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, D-35032 Marburg, Germany;
4
Center for Bioinformatics, Saarland University, D-66041 Saarbrücken, Germany;
5
Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center of Bioinformatics, Leipzig University, D-04107 Leipzig, Germany; Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany; Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany; Institute for Theoretical Chemistry, University of Vienna, A-1090 Vienna, Austria; Center for Non-Coding RNA in Technology and Health, University of Copenhagen, 1870 Frederiksberg C, Denmark; and Santa Fe Institute, Santa Fe, NM 87501.

Abstract

Phylogenomics heavily relies on well-curated sequence data sets that comprise, for each gene, exclusively 1:1 orthologos. Paralogs are treated as a dangerous nuisance that has to be detected and removed. We show here that this severe restriction of the data sets is not necessary. Building upon recent advances in mathematical phylogenetics, we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees, provided orthologs and paralogs can be distinguished with a degree of certainty. Starting from tree-free estimates of orthology, cograph editing can sufficiently reduce the noise to find correct event-annotated gene trees. The information of gene trees can then directly be translated into constraints on the species trees. Although the resolution is very poor for individual gene families, we show that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees, even in the presence of horizontal gene transfer.

KEYWORDS:

cograph; gene tree; orthology; paralogy; species tree

PMID:
25646426
PMCID:
PMC4343152
DOI:
10.1073/pnas.1412770112
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center