• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Jan 1990; 87(1): 118–122.
PMCID: PMC53211

Automatic generation of primary sequence patterns from sets of related protein sequences.

Abstract

We have developed a computer algorithm that can extract the pattern of conserved primary sequence elements common to all members of a homologous protein family. The method involves clustering the pairwise similarity scores among a set of related sequences to generate a binary dendrogram (tree). The tree is then reduced in a stepwise manner by progressively replacing the node connecting the two most similar termini by one common pattern until only a single common "root" pattern remains. A pattern is generated at a node by (i) performing a local optimal alignment on the sequence/pattern pair connected by the node with the use of an extended dynamic programming algorithm and then (ii) constructing a single common pattern from this alignment with a nested hierarchy of amino acid classes to identify the minimal inclusive amino acid class covering each paired set of elements in the alignment. Gaps within an alignment are created and/or extended using a "pay once" gap penalty rule, and gapped positions are converted into gap characters that function as 0 or 1 amino acid of any type during subsequent alignment. This method has been used to generate a library of covering patterns for homologous families in the National Biomedical Research Foundation/Protein Identification Resource protein sequence data base. We show that a covering pattern can be more diagnostic for sequence family membership than any of the individual sequences used to construct the pattern.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (1.1M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Images in this article

Click on the image to see a larger version.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Hodgman TC. The elucidation of protein function from its amino acid sequence. Comput Appl Biosci. 1986 Sep;2(3):181–187. [PubMed]
  • Taylor WR. Identification of protein sequence homology by consensus template alignment. J Mol Biol. 1986 Mar 20;188(2):233–258. [PubMed]
  • Blundell TL, Sibanda BL, Sternberg MJ, Thornton JM. Knowledge-based prediction of protein structures and the design of novel molecules. Nature. 326(6111):347–352. [PubMed]
  • Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. [PMC free article] [PubMed]
  • Patthy L. Detecting homology of distantly related proteins with consensus sequences. J Mol Biol. 1987 Dec 20;198(4):567–577. [PubMed]
  • Zvelebil MJ, Barton GJ, Taylor WR, Sternberg MJ. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J Mol Biol. 1987 Jun 20;195(4):957–961. [PubMed]
  • Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. [PMC free article] [PubMed]
  • Taylor WR. Pattern matching methods in protein sequence comparison and structure prediction. Protein Eng. 1988 Jul;2(2):77–86. [PubMed]
  • George DG, Barker WC, Hunt LT. The protein identification resource (PIR). Nucleic Acids Res. 1986 Jan 10;14(1):11–15. [PMC free article] [PubMed]
  • Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. [PubMed]
  • Waterman MS. Efficient sequence alignment algorithms. J Theor Biol. 1984 Jun 7;108(3):333–337. [PubMed]
  • Abarbanel RM, Wieneke PR, Mansfield E, Jaffe DA, Brutlag DL. Rapid searches for complex patterns in biological molecules. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):263–280. [PMC free article] [PubMed]
  • Mikes O, Holeysovský V, Tomásek V, Sorm F. Covalent structure of bovine trypsinogen. The position of the remaining amides. Biochem Biophys Res Commun. 1966 Aug 12;24(3):346–352. [PubMed]
  • Emi M, Nakamura Y, Ogawa M, Yamamoto T, Nishide T, Mori T, Matsubara K. Cloning, characterization and nucleotide sequences of two cDNAs encoding human pancreatic trypsinogens. Gene. 1986;41(2-3):305–310. [PubMed]
  • Itoh N, Tanaka N, Mihashi S, Yamashina I. Molecular cloning and sequence analysis of cDNA for batroxobin, a thrombin-like snake venom enzyme. J Biol Chem. 1987 Mar 5;262(7):3132–3135. [PubMed]
  • Bode W, Schwager P. The refined crystal structure of bovine beta-trypsin at 1.8 A resolution. II. Crystallographic refinement, calcium binding site, benzamidine binding site and active site at pH 7.0. J Mol Biol. 1975 Nov 15;98(4):693–717. [PubMed]
  • Leytus SP, Loeb KR, Hagen FS, Kurachi K, Davie EW. A novel trypsin-like serine protease (hepsin) with a putative transmembrane domain expressed by human liver and hepatoma cells. Biochemistry. 1988 Feb 9;27(3):1067–1074. [PubMed]
  • Bashford D, Chothia C, Lesk AM. Determinants of a protein fold. Unique features of the globin amino acid sequences. J Mol Biol. 1987 Jul 5;196(1):199–216. [PubMed]
  • Hanks SK, Quinn AM, Hunter T. The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. Science. 1988 Jul 1;241(4861):42–52. [PubMed]
  • Smith RF, Smith TF. Identification of new protein kinase-related genes in three herpesviruses, herpes simplex virus, varicella-zoster virus, and Epstein-Barr virus. J Virol. 1989 Jan;63(1):450–455. [PMC free article] [PubMed]
  • Wilbur WJ, Lipman DJ. Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci U S A. 1983 Feb;80(3):726–730. [PMC free article] [PubMed]
  • Smith TF, Waterman MS, Burks C. The statistical distribution of nucleic acid similarities. Nucleic Acids Res. 1985 Jan 25;13(2):645–656. [PMC free article] [PubMed]
  • Reichardt JK, Berg P. Conservation of short patches of amino acid sequence amongst proteins with a common function but evolutionarily distinct origins: implications for cloning genes and for structure-function analysis. Nucleic Acids Res. 1988 Sep 26;16(18):9017–9026. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

Formats:

Cited by other articles in PMC

See all...

Links

  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...