Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
Proteins. 2007 Mar 1;66(4):838-45.

Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training.

Author information

  • 1Department of Physiology and Biophysics, Center for Single Molecule Biophysics, Howard Hughes Medical Institute, State University of New York at Buffalo, Buffalo, New York 14214, USA.

Abstract

An integrated system of neural networks, called SPINE, is established and optimized for predicting structural properties of proteins. SPINE is applied to three-state secondary-structure and residue-solvent-accessibility (RSA) prediction in this paper. The integrated neural networks are carefully trained with a large dataset of 2640 chains, sequence profiles generated from multiple sequence alignment, representative amino acid properties, a slow learning rate, overfitting protection, and an optimized sliding-widow size. More than 200,000 weights in SPINE are optimized by maximizing the accuracy measured by Q(3) (the percentage of correctly classified residues). SPINE yields a 10-fold cross-validated accuracy of 79.5% (80.0% for chains of length between 50 and 300) in secondary-structure prediction after one-month (CPU time) training on 22 processors. An accuracy of 87.5% is achieved for exposed residues (RSA >95%). The latter approaches the theoretical upper limit of 88-90% accuracy in assigning secondary structures. An accuracy of 73% for three-state solvent-accessibility prediction (25%/75% cutoff) and 79.3% for two-state prediction (25% cutoff) is also obtained.

(c) 2006 Wiley-Liss, Inc.

PMID:
17177203
[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for John Wiley & Sons, Inc.
    Loading ...
    Write to the Help Desk