Format

Send to

Choose Destination
Syst Biol. 2015 Nov;64(6):969-82. doi: 10.1093/sysbio/syv044. Epub 2015 Jun 30.

Integrating Sequence Evolution into Probabilistic Orthology Analysis.

Author information

1
School of Computer Science and Communication, Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden;
2
Department of Numerical Analysis and Computer Science, Science for Life Laboratory, Stockholm University, Stockholm, Sweden;
3
School of Computer Science and Communication, KTH Royal Institute of Technology, Stockholm, Sweden;
4
Atherosclerosis Research Unit, Dept. of Medicine, Science for Life Laboratory, Karolinska Institutet, Solna, Sweden;
5
School of Computer Science and Communication, Science for Life Laboratory, Swedish e-Science Research Center (SeRC), KTH Royal Institute of Technology, Stockholm, Sweden; jensl@csc.kth.se.

Abstract

Orthology analysis, that is, finding out whether a pair of homologous genes are orthologs - stemming from a speciation - or paralogs - stemming from a gene duplication - is of central importance in computational biology, genome annotation, and phylogenetic inference. In particular, an orthologous relationship makes functional equivalence of the two genes highly likely. A major approach to orthology analysis is to reconcile a gene tree to the corresponding species tree, (most commonly performed using the most parsimonious reconciliation, MPR). However, most such phylogenetic orthology methods infer the gene tree without considering the constraints implied by the species tree and, perhaps even more importantly, only allow the gene sequences to influence the orthology analysis through the a priori reconstructed gene tree. We propose a sound, comprehensive Bayesian Markov chain Monte Carlo-based method, DLRSOrthology, to compute orthology probabilities. It efficiently sums over the possible gene trees and jointly takes into account the current gene tree, all possible reconciliations to the species tree, and the, typically strong, signal conveyed by the sequences. We compare our method with PrIME-GEM, a probabilistic orthology approach built on a probabilistic duplication-loss model, and MrBayesMPR, a probabilistic orthology approach that is based on conventional Bayesian inference coupled with MPR. We find that DLRSOrthology outperforms these competing approaches on synthetic data as well as on biological data sets and is robust to incomplete taxon sampling artifacts.

KEYWORDS:

Comparative genomics; gene duplication; gene loss; orthology; paralogy; phylogenetics; probabilistic modeling; relaxed molecular clock; sequence evolution; tree realization; tree reconciliation

PMID:
26130236
DOI:
10.1093/sysbio/syv044
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center