Format

Send to

Choose Destination
Proc Natl Acad Sci U S A. 2015 Jun 2;112(22):7003-8. doi: 10.1073/pnas.1424324112. Epub 2015 May 18.

Using homology relations within a database markedly boosts protein sequence similarity search.

Author information

1
Department of Molecular Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050;
2
Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114; Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114;
3
Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050.
4
Department of Molecular Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050; Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050 grishin@chop.swmed.edu.

Abstract

Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.

KEYWORDS:

homology detection; homology network; protein modeling; remote sequence similarity search; similarity score

PMID:
26038555
PMCID:
PMC4460465
DOI:
10.1073/pnas.1424324112
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center