• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Jan 1990; 87(1): 118–122.

Automatic generation of primary sequence patterns from sets of related protein sequences.


We have developed a computer algorithm that can extract the pattern of conserved primary sequence elements common to all members of a homologous protein family. The method involves clustering the pairwise similarity scores among a set of related sequences to generate a binary dendrogram (tree). The tree is then reduced in a stepwise manner by progressively replacing the node connecting the two most similar termini by one common pattern until only a single common "root" pattern remains. A pattern is generated at a node by (i) performing a local optimal alignment on the sequence/pattern pair connected by the node with the use of an extended dynamic programming algorithm and then (ii) constructing a single common pattern from this alignment with a nested hierarchy of amino acid classes to identify the minimal inclusive amino acid class covering each paired set of elements in the alignment. Gaps within an alignment are created and/or extended using a "pay once" gap penalty rule, and gapped positions are converted into gap characters that function as 0 or 1 amino acid of any type during subsequent alignment. This method has been used to generate a library of covering patterns for homologous families in the National Biomedical Research Foundation/Protein Identification Resource protein sequence data base. We show that a covering pattern can be more diagnostic for sequence family membership than any of the individual sequences used to construct the pattern.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (1.1M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Images in this article

Click on the image to see a larger version.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Hodgman TC. The elucidation of protein function from its amino acid sequence. Comput Appl Biosci. 1986 Sep;2(3):181–187. [PubMed]
  • Taylor WR. Identification of protein sequence homology by consensus template alignment. J Mol Biol. 1986 Mar 20;188(2):233–258. [PubMed]
  • Blundell TL, Sibanda BL, Sternberg MJ, Thornton JM. Knowledge-based prediction of protein structures and the design of novel molecules. Nature. 326(6111):347–352. [PubMed]
  • Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. [PMC free article] [PubMed]
  • Patthy L. Detecting homology of distantly related proteins with consensus sequences. J Mol Biol. 1987 Dec 20;198(4):567–577. [PubMed]
  • Zvelebil MJ, Barton GJ, Taylor WR, Sternberg MJ. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J Mol Biol. 1987 Jun 20;195(4):957–961. [PubMed]
  • Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. [PMC free article] [PubMed]
  • Taylor WR. Pattern matching methods in protein sequence comparison and structure prediction. Protein Eng. 1988 Jul;2(2):77–86. [PubMed]
  • George DG, Barker WC, Hunt LT. The protein identification resource (PIR). Nucleic Acids Res. 1986 Jan 10;14(1):11–15. [PMC free article] [PubMed]
  • Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. [PubMed]
  • Waterman MS. Efficient sequence alignment algorithms. J Theor Biol. 1984 Jun 7;108(3):333–337. [PubMed]
  • Abarbanel RM, Wieneke PR, Mansfield E, Jaffe DA, Brutlag DL. Rapid searches for complex patterns in biological molecules. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):263–280. [PMC free article] [PubMed]
  • Mikes O, Holeysovský V, Tomásek V, Sorm F. Covalent structure of bovine trypsinogen. The position of the remaining amides. Biochem Biophys Res Commun. 1966 Aug 12;24(3):346–352. [PubMed]
  • Emi M, Nakamura Y, Ogawa M, Yamamoto T, Nishide T, Mori T, Matsubara K. Cloning, characterization and nucleotide sequences of two cDNAs encoding human pancreatic trypsinogens. Gene. 1986;41(2-3):305–310. [PubMed]
  • Itoh N, Tanaka N, Mihashi S, Yamashina I. Molecular cloning and sequence analysis of cDNA for batroxobin, a thrombin-like snake venom enzyme. J Biol Chem. 1987 Mar 5;262(7):3132–3135. [PubMed]
  • Bode W, Schwager P. The refined crystal structure of bovine beta-trypsin at 1.8 A resolution. II. Crystallographic refinement, calcium binding site, benzamidine binding site and active site at pH 7.0. J Mol Biol. 1975 Nov 15;98(4):693–717. [PubMed]
  • Leytus SP, Loeb KR, Hagen FS, Kurachi K, Davie EW. A novel trypsin-like serine protease (hepsin) with a putative transmembrane domain expressed by human liver and hepatoma cells. Biochemistry. 1988 Feb 9;27(3):1067–1074. [PubMed]
  • Bashford D, Chothia C, Lesk AM. Determinants of a protein fold. Unique features of the globin amino acid sequences. J Mol Biol. 1987 Jul 5;196(1):199–216. [PubMed]
  • Hanks SK, Quinn AM, Hunter T. The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. Science. 1988 Jul 1;241(4861):42–52. [PubMed]
  • Smith RF, Smith TF. Identification of new protein kinase-related genes in three herpesviruses, herpes simplex virus, varicella-zoster virus, and Epstein-Barr virus. J Virol. 1989 Jan;63(1):450–455. [PMC free article] [PubMed]
  • Wilbur WJ, Lipman DJ. Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci U S A. 1983 Feb;80(3):726–730. [PMC free article] [PubMed]
  • Smith TF, Waterman MS, Burks C. The statistical distribution of nucleic acid similarities. Nucleic Acids Res. 1985 Jan 25;13(2):645–656. [PMC free article] [PubMed]
  • Reichardt JK, Berg P. Conservation of short patches of amino acid sequence amongst proteins with a common function but evolutionarily distinct origins: implications for cloning genes and for structure-function analysis. Nucleic Acids Res. 1988 Sep 26;16(18):9017–9026. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences


Cited by other articles in PMC

See all...


  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...