Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis

Bioinformatics. 2008 Sep 15;24(18):2002-9. doi: 10.1093/bioinformatics/btn353. Epub 2008 Jul 16.

Abstract

Motivation: Accurate predictive models for the impact of single amino acid substitutions on protein stability provide insight into protein structure and function. Such models are also valuable for the design and engineering of new proteins. Previously described methods have utilized properties of protein sequence or structure to predict the free energy change of mutants due to thermal (DeltaDeltaG) and denaturant (DeltaDeltaG(H2O)) denaturations, as well as mutant thermal stability (DeltaT(m)), through the application of either computational energy-based approaches or machine learning techniques. However, accuracy associated with applying these methods separately is frequently far from optimal.

Results: We detail a computational mutagenesis technique based on a four-body, knowledge-based, statistical contact potential. For any mutation due to a single amino acid replacement in a protein, the method provides an empirical normalized measure of the ensuing environmental perturbation occurring at every residue position. A feature vector is generated for the mutant by considering perturbations at the mutated position and it's ordered six nearest neighbors in the 3-dimensional (3D) protein structure. These predictors of stability change are evaluated by applying machine learning tools to large training sets of mutants derived from diverse proteins that have been experimentally studied and described. Predictive models based on our combined approach are either comparable to, or in many cases significantly outperform, previously published results.

Availability: A web server with supporting documentation is available at http://proteins.gmu.edu/automute.

Publication types

  • Evaluation Study

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Computational Biology*
  • Computer Simulation
  • Databases, Protein
  • Models, Molecular
  • Mutagenesis*
  • Protein Folding
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / genetics*
  • Sequence Alignment
  • Sequence Analysis, Protein
  • Structure-Activity Relationship
  • Thermodynamics

Substances

  • Proteins