Format

Send to

Choose Destination
G3 (Bethesda). 2015 Feb 23;5(4):629-38. doi: 10.1534/g3.115.017095.

Rock, paper, scissors: harnessing complementarity in ortholog detection methods improves comparative genomic inference.

Author information

1
Department of Epidemiology and Biostatistics, University of California, San Francisco, University of California, San Francisco, San Francisco, California.
2
Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California Institute for Human Genetics, University of California, San Francisco, San Francisco, California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, San Francisco, California 94158 ryan.hernandez@ucsf.edu.

Abstract

Ortholog detection (OD) is a lynchpin of most statistical methods in comparative genomics. This task involves accurately identifying genes across species that descend from a common ancestral sequence. OD methods comprise a wide variety of approaches, each with their own benefits and costs under a variety of evolutionary and practical scenarios. In this article, we examine the proteomes of ten mammals by using four methodologically distinct, rigorously filtered OD methods. In head-to-head comparisons, we find that these algorithms significantly outperform one another for 38-45% of the genes analyzed. We leverage this high complementarity through the development MOSAIC, or Multiple Orthologous Sequence Analysis and Integration by Cluster optimization, the first tool for integrating methodologically diverse OD methods. Relative to the four methods examined, MOSAIC more than quintuples the number of alignments for which all species are present while simultaneously maintaining or improving functional-, phylogenetic-, and sequence identity-based measures of ortholog quality. Further, this improvement in alignment quality yields more confidently aligned sites and higher levels of overall conservation, while simultaneously detecting of up to 180% more positively selected sites. We close by highlighting a MOSAIC-specific positively selected sites near the active site of TPSAB1, an enzyme linked to asthma, heart disease, and irritable bowel disease. MOSAIC alignments, source code, and full documentation are available at http://pythonhosted.org/bio-MOSAIC.

KEYWORDS:

comparative genomics; multiple sequence alignment; open source software; ortholog detection; positive selection

PMID:
25711833
PMCID:
PMC4390578
DOI:
10.1534/g3.115.017095
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center