Logo of prosciprotein sciencecshl presssubscriptionsetoc alertsthe protein societyjournal home
Protein Sci. Mar 1993; 2(3): 305–314.
PMCID: PMC2142382

Structural analysis based on state-space modeling.


A new method has been developed to compute the probability that each amino acid in a protein sequence is in a particular secondary structural element. Each of these probabilities is computed using the entire sequence and a set of predefined structural class models. This set of structural classes is patterned after Jane Richardson's taxonomy for the domains of globular proteins. For each structural class considered, a mathematical model is constructed to represent constraints on the pattern of secondary structural elements characteristic of that class. These are stochastic models having discrete state spaces (referred to as hidden Markov models by researchers in signal processing and automatic speech recognition). Each model is a mathematical generator of amino acid sequences; the sequence under consideration is modeled as having been generated by one model in the set of candidates. The probability that each model generated the given sequence is computed using a filtering algorithm. The protein is then classified as belonging to the structural class having the most probable model. The secondary structure of the sequence is then analyzed using a "smoothing" algorithm that is optimal for that structural class model. For each residue position in the sequence, the smoother computes the probability that the residue is contained within each of the defined secondary structural elements of the model. This method has two important advantages: (1) the probability of each residue being in each of the modeled secondary structural elements is computed using the totality of the amino acid sequence, and (2) these probabilities are consistent with prior knowledge of realizable domain folds as encoded in each model. As an example of the method's utility, we present its application to flavodoxin, a prototypical alpha/beta protein having a central beta-sheet, and to thioredoxin, which belongs to a similar structural class but shares no significant sequence similarity.

Full Text

The Full Text of this article is available as a PDF (2.1M).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Bairoch A, Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1992 May 11;20 (Suppl):2019–2022. [PMC free article] [PubMed]
  • Benner SA, Cohen MA, Gerloff D. Correct structure prediction? Nature. 1992 Oct 29;359(6398):781–781. [PubMed]
  • Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977 May 25;112(3):535–542. [PubMed]
  • Bowie JU, Lüthy R, Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991 Jul 12;253(5016):164–170. [PubMed]
  • Chothia C. Hydrophobic bonding and accessible surface area in proteins. Nature. 1974 Mar 22;248(446):338–339. [PubMed]
  • Chou PY, Fasman GD. Prediction of protein conformation. Biochemistry. 1974 Jan 15;13(2):222–245. [PubMed]
  • Eisenberg D, McLachlan AD. Solvation energy in protein folding and binding. Nature. 1986 Jan 16;319(6050):199–203. [PubMed]
  • Garnier J, Osguthorpe DJ, Robson B. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol. 1978 Mar 25;120(1):97–120. [PubMed]
  • Hobohm U, Scharf M, Schneider R, Sander C. Selection of representative protein data sets. Protein Sci. 1992 Mar;1(3):409–417. [PMC free article] [PubMed]
  • Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. [PubMed]
  • Katti SK, LeMaster DM, Eklund H. Crystal structure of thioredoxin from Escherichia coli at 1.68 A resolution. J Mol Biol. 1990 Mar 5;212(1):167–184. [PubMed]
  • Musacchio A, Noble M, Pauptit R, Wierenga R, Saraste M. Crystal structure of a Src-homology 3 (SH3) domain. Nature. 1992 Oct 29;359(6398):851–855. [PubMed]
  • Nishikawa K, Kubota Y, Ooi T. Classification of proteins into groups based on amino acid composition and other characters. I. Angular distribution. J Biochem. 1983 Sep;94(3):981–995. [PubMed]
  • Ponder JW, Richards FM. Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. J Mol Biol. 1987 Feb 20;193(4):775–791. [PubMed]
  • Richardson JS. The anatomy and taxonomy of protein structure. Adv Protein Chem. 1981;34:167–339. [PubMed]
  • Smith RF, Smith TF. Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci U S A. 1990 Jan;87(1):118–122. [PMC free article] [PubMed]
  • Smith WW, Burnett RM, Darling GD, Ludwig ML. Structure of the semiquinone form of flavodoxin from Clostridum MP. Extension of 1.8 A resolution and some comparisons with the oxidized state. J Mol Biol. 1977 Nov 25;117(1):195–225. [PubMed]
  • Zhang CT, Chou KC. An optimization approach to predicting protein structural class from amino acid composition. Protein Sci. 1992 Mar;1(3):401–408. [PMC free article] [PubMed]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...