Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments

Dariusz Przybylski; Burkhard Rost

doi:10.1093/nar/gkm107

Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments

Nucleic Acids Res. 2007;35(7):2238-46. doi: 10.1093/nar/gkm107. Epub 2007 Mar 16.

Authors

Dariusz Przybylski¹, Burkhard Rost

Affiliation

¹ Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA. dsp23@columbia.edu

Abstract

Sequence alignments may be the most fundamental computational resource for molecular biology. The best methods that identify sequence relatedness through profile-profile comparisons are much slower and more complex than sequence-sequence and sequence-profile comparisons such as, respectively, BLAST and PSI-BLAST. Families of related genes and gene products (proteins) can be represented by consensus sequences that list the nucleic/amino acid most frequent at each sequence position in that family. Here, we propose a novel approach for consensus-sequence-based comparisons. This approach improved searches and alignments as a standard add-on to PSI-BLAST without any changes of code. Improvements were particularly significant for more difficult tasks such as the identification of distant structural relations between proteins and their corresponding alignments. Despite the fact that the improvements were higher for more divergent relations, they were consistent even at high accuracy/low error rates for non-trivially related proteins. The improvements were very easy to achieve; no parameter used by PSI-BLAST was altered and no single line of code changed. Furthermore, the consensus sequence add-on required relatively little additional CPU time. We discuss how advanced users of PSI-BLAST can immediately benefit from using consensus sequences on their local computers. We have also made the method available through the Internet (http://www.rostlab.org/services/consensus/).

Publication types

Evaluation Study
Research Support, N.I.H., Extramural

MeSH terms

Amino Acid Sequence
Amino Acid Substitution
Base Sequence
Consensus Sequence
Sequence Alignment / methods*
Sequence Analysis, Protein
Software

Abstract

Publication types

MeSH terms

Grants and funding