Format

Send to:

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 2007 Dec 1;23(23):3155-61. Epub 2007 Oct 31.

Accurate prediction of enzyme mutant activity based on a multibody statistical potential.

Author information

  • 1Laboratory for Structural Bioinformatics, School of Computational Sciences, George Mason University, 10900 University Boulevard, MSN 5B3, Manassas, VA 20110, USA.

Abstract

MOTIVATION:

An important area of research in biochemistry and molecular biology focuses on characterization of enzyme mutants. However, synthesis and analysis of experimental mutants is time consuming and expensive. We describe a machine-learning approach for inferring the activity levels of all unexplored single point mutants of an enzyme, based on a training set of such mutants with experimentally measured activity.

RESULTS:

Based on a Delaunay tessellation-derived four-body statistical potential function, a perturbation vector measuring environmental changes relative to wild type (wt) at every residue position uniquely characterizes each enzyme mutant for model development and prediction. First, a measure of model performance utilizing area (AUC) under the receiver operating characteristic (ROC) curve surpasses 0.83 and 0.77 for data sets of experimental HIV-1 protease and T4 lysozyme mutants, respectively. Additionally, a novel method is introduced for evaluating statistical significance associated with the number of correct test set predictions obtained from a trained model. Third, 100 stratified random splits of the protease and T4 lysozyme mutant data sets into training and test sets achieve 77.0% and 80.8% mean accuracy, respectively. Next, protease and T4 lysozyme models trained with experimental mutants are used to predict activity levels for all remaining mutants; a subsequent search for publications reporting on dozens of these test mutants reveals that experimental results are matched by 79% and 86% of predictions, respectively. Finally, learning curves for each mutant enzyme system indicate the influence of training set size on model performance.

AVAILABILITY:

Prediction databases at http://proteins.gmu.edu/automute/

PMID:
17977887
[PubMed - indexed for MEDLINE]
Free full text
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for HighWire
    Loading ...
    Write to the Help Desk