SeqRate: sequence-based protein folding type classification and rates prediction

BMC Bioinformatics. 2010 Apr 29;11 Suppl 3(Suppl 3):S1. doi: 10.1186/1471-2105-11-S3-S1.

Abstract

Background: Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic nature (two-state folding or multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using sequence length, amino acid composition, contact order, contact number, and secondary structure information predicted from only protein sequence with support vector machines.

Results: We systematically studied the contributions of individual features to folding rate prediction. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Its performance can be further enhanced with additional information, such as structure-based geometric contacts, as inputs.

Conclusions: Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold_rate/index.html.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Amino Acids / chemistry
  • Artificial Intelligence*
  • Computational Biology / methods*
  • Internet
  • Kinetics
  • Linear Models
  • Protein Conformation
  • Protein Folding*
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / metabolism
  • Reproducibility of Results
  • Sequence Analysis, Protein / methods*
  • Software

Substances

  • Amino Acids
  • Proteins