Logo of narLink to Publisher's site
Nucleic Acids Res. 1991 Dec 11; 19(23): 6565–6572.
PMCID: PMC329220

Automated assembly of protein blocks for database searching.


A system is described for finding and assembling the most highly conserved regions of related proteins for database searching. First, an automated version of Smith's algorithm for finding motifs is used for sensitive detection of multiple local alignments. Next, the local alignments are converted to blocks and the best set of non-overlapping blocks is determined. When the automated system was applied successively to all 437 groups of related proteins in the PROSITE catalog, 1764 blocks resulted; these could be used for very sensitive searches of sequence databases. Each block was calibrated by searching the SWISS-PROT database to obtain a measure of the chance distribution of matches, and the calibrated blocks were concatenated into a database that could itself be searched. Examples are provided in which distant relationships are detected either using a set of blocks to search a sequence database or using sequences to search the database of blocks. The practical use of the blocks database is demonstrated by detecting previously unknown relationships between oxidoreductases and by evaluating a proposed relationship between HIV Vif protein and thiol proteases.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (1.6M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Henikoff S, Wallace JC, Brown JP. Finding protein similarities with nucleotide sequence databases. Methods Enzymol. 1990;183:111–132. [PubMed]
  • Smith RF, Smith TF. Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci U S A. 1990 Jan;87(1):118–122. [PMC free article] [PubMed]
  • Smith HO, Annau TM, Chandrasegaran S. Finding sequence motifs in groups of functionally related proteins. Proc Natl Acad Sci U S A. 1990 Jan;87(2):826–830. [PMC free article] [PubMed]
  • Bairoch A. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 1991 Apr 25;19 (Suppl):2241–2245. [PMC free article] [PubMed]
  • Vingron M, Argos P. A fast and sensitive multiple sequence alignment algorithm. Comput Appl Biosci. 1989 Apr;5(2):115–121. [PubMed]
  • Bairoch A, Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1991 Apr 25;19 (Suppl):2247–2249. [PMC free article] [PubMed]
  • Burbaum JJ, Starzyk RM, Schimmel P. Understanding structural relationships in proteins of unsolved three-dimensional structure. Proteins. 1990;7(2):99–111. [PubMed]
  • Pearson WR. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 1990;183:63–98. [PubMed]
  • Attwood TK, Eliopoulos EE, Findlay JB. Multiple sequence alignment of protein families showing low sequence homology: a methodological approach using database pattern-matching discriminators for G-protein-linked receptors. Gene. 1991 Feb 15;98(2):153–159. [PubMed]
  • Citron BA, Milstien S, Gutierrez JC, Levine RA, Yanak BL, Kaufman S. Isolation and expression of rat liver sepiapterin reductase cDNA. Proc Natl Acad Sci U S A. 1990 Aug;87(16):6436–6440. [PMC free article] [PubMed]
  • Schulz R, Steinmüller K, Klaas M, Forreiter C, Rasmussen S, Hiller C, Apel K. Nucleotide sequence of a cDNA coding for the NADPH-protochlorophyllide oxidoreductase (PCR) of barley (Hordeum vulgare L.) and its expression in Escherichia coli. Mol Gen Genet. 1989 Jun;217(2-3):355–361. [PubMed]
  • Darrah PM, Kay SA, Teakle GR, Griffiths WT. Cloning and sequencing of protochlorophyllide reductase. Biochem J. 1990 Feb 1;265(3):789–798. [PMC free article] [PubMed]
  • Keller JW, Baurick KB, Rutt GC, O'Malley MV, Sonafrank NL, Reynolds RA, Ebbesson LO, Vajdos FF. Pseudomonas cepacia 2,2-dialkylglycine decarboxylase. Sequence and expression in Escherichia coli of structural and repressor genes. J Biol Chem. 1990 Apr 5;265(10):5531–5539. [PubMed]
  • Schuler GD, Altschul SF, Lipman DJ. A workbench for multiple alignment construction and analysis. Proteins. 1991;9(3):180–190. [PubMed]
  • Taylor WR. A flexible method to align large numbers of biological sequences. J Mol Evol. 1988 Dec;28(1-2):161–169. [PubMed]
  • Fuchs R. MacPattern: protein pattern searching on the Apple Macintosh. Comput Appl Biosci. 1991 Jan;7(1):105–106. [PubMed]
  • Sternberg MJ. PROMOT: a FORTRAN program to scan protein sequences against a library of known motifs. Comput Appl Biosci. 1991 Apr;7(2):257–260. [PubMed]
  • Karreman C, de Waard A. Agmenellum quadruplicatum M.AquI, a novel modification methylase. J Bacteriol. 1990 Jan;172(1):266–272. [PMC free article] [PubMed]
  • Altschul SF, Lipman DJ. Protein database searches for multiple alignments. Proc Natl Acad Sci U S A. 1990 Jul;87(14):5509–5513. [PMC free article] [PubMed]
  • Taylor WR. Identification of protein sequence homology by consensus template alignment. J Mol Biol. 1986 Mar 20;188(2):233–258. [PubMed]
  • Brenner S. Phosphotransferase sequence homology. Nature. 1987 Sep 3;329(6134):21–21. [PubMed]
  • Patthy L. Detecting homology of distantly related proteins with consensus sequences. J Mol Biol. 1987 Dec 20;198(4):567–577. [PubMed]
  • Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. [PMC free article] [PubMed]
  • Staden R. Searching for patterns in protein and nucleic acid sequences. Methods Enzymol. 1990;183:193–211. [PubMed]
  • Pósfai J, Bhagwat AS, Pósfai G, Roberts RJ. Predictive motifs derived from cytosine methyltransferases. Nucleic Acids Res. 1989 Apr 11;17(7):2421–2435. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...