• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of prosciprotein sciencecshl presssubscriptionsetoc alertsthe protein societyjournal home
Protein Sci. Sep 1997; 6(9): 1963–1975.
PMCID: PMC2143796

Predicting protein secondary structure with probabilistic schemata of evolutionarily derived information.


We demonstrate the applicability of our previously developed Bayesian probabilistic approach for predicting residue solvent accessibility to the problem of predicting secondary structure. Using only single-sequence data, this method achieves a three-state accuracy of 67% over a database of 473 non-homologous proteins. This approach is more amenable to inspection and less likely to overlearn specifics of a dataset than "black box" methods such as neural networks. It is also conceptually simpler and less computationally costly. We also introduce a novel method for representing and incorporating multiple-sequence alignment information within the prediction algorithm, achieving 72% accuracy over a dataset of 304 non-homologous proteins. This is accomplished by creating a statistical model of the evolutionarily derived correlations between patterns of amino acid substitution and local protein structure. This model consists of parameter vectors, termed "substitution schemata," which probabilistically encode the structure-based heterogeneity in the distributions of amino acid substitutions found in alignments of homologous proteins. The model is optimized for structure prediction by maximizing the mutual information between the set of schemata and the database of secondary structures. Unlike "expert heuristic" methods, this approach has been demonstrated to work well over large datasets. Unlike the opaque neural network algorithms, this approach is physicochemically intelligible. Moreover, the model optimization procedure, the formalism for predicting one-dimensional structural features and our previously developed method for tertiary structure recognition all share a common Bayesian probabilistic basis. This consistency starkly contrasts with the hybrid and ad hoc nature of methods that have dominated this field in recent years.

Full Text

The Full Text of this article is available as a PDF (2.9M).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Aronson HE, Royer WE, Jr, Hendrickson WA. Quantification of tertiary structural conservation despite primary sequence drift in the globin fold. Protein Sci. 1994 Oct;3(10):1706–1711. [PMC free article] [PubMed]
  • Asai K, Hayamizu S, Handa K. Prediction of protein secondary structure by the hidden Markov model. Comput Appl Biosci. 1993 Apr;9(2):141–146. [PubMed]
  • Benner SA. Patterns of divergence in homologous proteins as indicators of tertiary and quaternary structure. Adv Enzyme Regul. 1989;28:219–236. [PubMed]
  • Benner SA, Gerloff DL. Predicting the conformation of proteins. Man versus machine. FEBS Lett. 1993 Jun 28;325(1-2):29–33. [PubMed]
  • Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977 May 25;112(3):535–542. [PubMed]
  • Gibrat JF, Garnier J, Robson B. Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J Mol Biol. 1987 Dec 5;198(3):425–443. [PubMed]
  • Goldman N, Thorne JL, Jones DT. Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. J Mol Biol. 1996 Oct 25;263(2):196–208. [PubMed]
  • Goldstein RA, Luthey-Schulten ZA, Wolynes PG. Protein tertiary structure recognition using optimized Hamiltonians with local interactions. Proc Natl Acad Sci U S A. 1992 Oct 1;89(19):9029–9033. [PMC free article] [PubMed]
  • Govindarajan S, Goldstein RA. Why are some proteins structures so common? Proc Natl Acad Sci U S A. 1996 Apr 16;93(8):3341–3345. [PMC free article] [PubMed]
  • Harvey PH, Purvis A. Comparative methods for explaining adaptations. Nature. 1991 Jun 20;351(6328):619–624. [PubMed]
  • Hobohm U, Sander C. Enlarged representative set of protein structures. Protein Sci. 1994 Mar;3(3):522–524. [PMC free article] [PubMed]
  • Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. [PubMed]
  • King RD, Sternberg MJ. Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci. 1996 Nov;5(11):2298–2310. [PMC free article] [PubMed]
  • Krogh A, Brown M, Mian IS, Sjölander K, Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994 Feb 4;235(5):1501–1531. [PubMed]
  • Lim VI. Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. J Mol Biol. 1974 Oct 5;88(4):873–894. [PubMed]
  • Mamitsuka H. Representing inter-residue dependencies in protein sequences with probabilistic networks. Comput Appl Biosci. 1995 Aug;11(4):413–422. [PubMed]
  • Maxfield FR, Scheraga HA. Improvements in the prediction of protein backbone topography by reduction of statistical errors. Biochemistry. 1979 Feb 20;18(4):697–704. [PubMed]
  • Mehta PK, Heringa J, Argos P. A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%. Protein Sci. 1995 Dec;4(12):2517–2525. [PMC free article] [PubMed]
  • Riis SK, Krogh A. Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments. J Comput Biol. 1996 Spring;3(1):163–183. [PubMed]
  • Robson B. Analysis of code relating sequences to conformation in globular prtoeins. Theory and application of expected information. Biochem J. 1974 Sep;141(3):853–867. [PMC free article] [PubMed]
  • Rost B, Sander C. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol. 1993 Jul 20;232(2):584–599. [PubMed]
  • Shrake A, Rupley JA. Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J Mol Biol. 1973 Sep 15;79(2):351–371. [PubMed]
  • Sjölander K, Karplus K, Brown M, Hughey R, Krogh A, Mian IS, Haussler D. Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology. Comput Appl Biosci. 1996 Aug;12(4):327–345. [PubMed]
  • Stolorz P, Lapedes A, Xia Y. Predicting protein secondary structure using neural net and statistical methods. J Mol Biol. 1992 May 20;225(2):363–377. [PubMed]
  • Stultz CM, White JV, Smith TF. Structural analysis based on state-space modeling. Protein Sci. 1993 Mar;2(3):305–314. [PMC free article] [PubMed]
  • Thompson MJ, Goldstein RA. Predicting solvent accessibility: higher accuracy using Bayesian statistics and optimized residue substitution classes. Proteins. 1996 May;25(1):38–47. [PubMed]
  • Wako H, Blundell TL. Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. I. Solvent accessibility classes. J Mol Biol. 1994 May 20;238(5):682–692. [PubMed]
  • Zhang X, Mesirov JP, Waltz DL. Hybrid system for protein secondary structure prediction. J Mol Biol. 1992 Jun 20;225(4):1049–1063. [PubMed]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...