Send to

Choose Destination
Brief Bioinform. 2019 Jul 5. pii: bbz071. doi: 10.1093/bib/bbz071. [Epub ahead of print]

A critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation.

Author information

Computational & Systems Biology Branch, Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA.


A number of machine learning (ML)-based algorithms have been proposed for predicting mutation-induced stability changes in proteins. In this critical review, we used hypothetical reverse mutations to evaluate the performance of five representative algorithms and found all of them suffer from the problem of overfitting. This approach is based on the fact that if a wild-type protein is more stable than a mutant protein, then the same mutant is less stable than the wild-type protein. We analyzed the underlying issues and suggest that the main causes of the overfitting problem include that the numbers of training cases were too small, and the features used in the models were not sufficiently informative for the task. We make recommendations on how to avoid overfitting in this important research area and improve the reliability and robustness of ML-based algorithms in general.


computational prediction; mutation; protein stability; reliability; reverse mutation; robustness


Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center