Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. 1993 Aug 15; 90(16): 7558–7562.

Improved prediction of protein secondary structure by use of sequence profiles and neural networks.


The explosive accumulation of protein sequences in the wake of large-scale sequencing projects is in stark contrast to the much slower experimental determination of protein structures. Improved methods of structure prediction from the gene sequence alone are therefore needed. Here, we report a substantial increase in both the accuracy and quality of secondary-structure predictions, using a neural-network algorithm. The main improvements come from the use of multiple sequence alignments (better overall accuracy), from "balanced training" (better prediction of beta-strands), and from "structure context training" (better prediction of helix and strand lengths). This method, cross-validated on seven different test sets purged of sequence similarity to learning sets, achieves a three-state prediction accuracy of 69.7%, significantly better than previous methods. In addition, the predicted structures have a more realistic distribution of helix and strand segments. The predictions may be suitable for use in practice as a first estimate of the structural type of newly sequenced proteins.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (1.1M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Garnier J, Levin JM. The protein structure code: what is its present status? Comput Appl Biosci. 1991 Apr;7(2):133–142. [PubMed]
  • Levin JM, Garnier J. Improvements in a secondary structure prediction method based on a search for local sequence homologies and its use as a model building tool. Biochim Biophys Acta. 1988 Aug 10;955(3):283–295. [PubMed]
  • Salzberg S, Cost S. Predicting protein secondary structure with a nearest-neighbor algorithm. J Mol Biol. 1992 Sep 20;227(2):371–374. [PubMed]
  • Zhang X, Mesirov JP, Waltz DL. Hybrid system for protein secondary structure prediction. J Mol Biol. 1992 Jun 20;225(4):1049–1063. [PubMed]
  • Qian N, Sejnowski TJ. Predicting the secondary structure of globular proteins using neural network models. J Mol Biol. 1988 Aug 20;202(4):865–884. [PubMed]
  • Hobohm U, Scharf M, Schneider R, Sander C. Selection of representative protein data sets. Protein Sci. 1992 Mar;1(3):409–417. [PMC free article] [PubMed]
  • Holley LH, Karplus M. Protein secondary structure prediction with a neural network. Proc Natl Acad Sci U S A. 1989 Jan;86(1):152–156. [PMC free article] [PubMed]
  • Bohr H, Bohr J, Brunak S, Cotterill RM, Lautrup B, Nørskov L, Olsen OH, Petersen SB. Protein secondary structure and homology by neural networks. The alpha-helices in rhodopsin. FEBS Lett. 1988 Dec 5;241(1-2):223–228. [PubMed]
  • Kneller DG, Cohen FE, Langridge R. Improvements in protein secondary structure prediction by an enhanced neural network. J Mol Biol. 1990 Jul 5;214(1):171–182. [PubMed]
  • Stolorz P, Lapedes A, Xia Y. Predicting protein secondary structure using neural net and statistical methods. J Mol Biol. 1992 May 20;225(2):363–377. [PubMed]
  • Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J. 1986 Apr;5(4):823–826. [PMC free article] [PubMed]
  • Sander C, Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins. 1991;9(1):56–68. [PubMed]
  • Overington J, Johnson MS, Sali A, Blundell TL. Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction. Proc Biol Sci. 1990 Aug 22;241(1301):132–145. [PubMed]
  • Benner SA, Gerloff D. Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases. Adv Enzyme Regul. 1991;31:121–181. [PubMed]
  • Barton GJ, Newman RH, Freemont PS, Crumpton MJ. Amino acid sequence analysis of the annexin super-gene family of proteins. Eur J Biochem. 1991 Jun 15;198(3):749–760. [PubMed]
  • Maxfield FR, Scheraga HA. Improvements in the prediction of protein backbone topography by reduction of statistical errors. Biochemistry. 1979 Feb 20;18(4):697–704. [PubMed]
  • Russell RB, Breed J, Barton GJ. Conservation analysis and structure prediction of the SH2 family of phosphotyrosine binding domains. FEBS Lett. 1992 Jun 8;304(1):15–20. [PubMed]
  • Zvelebil MJ, Barton GJ, Taylor WR, Sternberg MJ. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J Mol Biol. 1987 Jun 20;195(4):957–961. [PubMed]
  • Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. [PubMed]
  • Ptitsyn OB, Finkelstein AV. Theory of protein secondary structure and algorithm of its prediction. Biopolymers. 1983 Jan;22(1):15–25. [PubMed]
  • Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta. 1975 Oct 20;405(2):442–451. [PubMed]
  • Hayward S, Collins JF. Limits on alpha-helix prediction with neural network models. Proteins. 1992 Nov;14(3):372–381. [PubMed]
  • Muggleton S, King RD, Sternberg MJ. Protein secondary structure prediction using logic-based machine learning. Protein Eng. 1992 Oct;5(7):647–657. [PubMed]
  • Sacchettini JC, Gordon JI, Banaszak LJ. Refined apoprotein structure of rat intestinal fatty acid binding protein produced in Escherichia coli. Proc Natl Acad Sci U S A. 1989 Oct;86(20):7736–7740. [PMC free article] [PubMed]
  • Richardson JS. The anatomy and taxonomy of protein structure. Adv Protein Chem. 1981;34:167–339. [PubMed]
  • Holm L, Ouzounis C, Sander C, Tuparev G, Vriend G. A database of protein structure families with common folding motifs. Protein Sci. 1992 Dec;1(12):1691–1698. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...