Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments

Nucleic Acids Res. 2007;35(7):2238-46. doi: 10.1093/nar/gkm107. Epub 2007 Mar 16.

Abstract

Sequence alignments may be the most fundamental computational resource for molecular biology. The best methods that identify sequence relatedness through profile-profile comparisons are much slower and more complex than sequence-sequence and sequence-profile comparisons such as, respectively, BLAST and PSI-BLAST. Families of related genes and gene products (proteins) can be represented by consensus sequences that list the nucleic/amino acid most frequent at each sequence position in that family. Here, we propose a novel approach for consensus-sequence-based comparisons. This approach improved searches and alignments as a standard add-on to PSI-BLAST without any changes of code. Improvements were particularly significant for more difficult tasks such as the identification of distant structural relations between proteins and their corresponding alignments. Despite the fact that the improvements were higher for more divergent relations, they were consistent even at high accuracy/low error rates for non-trivially related proteins. The improvements were very easy to achieve; no parameter used by PSI-BLAST was altered and no single line of code changed. Furthermore, the consensus sequence add-on required relatively little additional CPU time. We discuss how advanced users of PSI-BLAST can immediately benefit from using consensus sequences on their local computers. We have also made the method available through the Internet (http://www.rostlab.org/services/consensus/).

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Amino Acid Sequence
  • Amino Acid Substitution
  • Base Sequence
  • Consensus Sequence
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein
  • Software