Prediction of amyloid aggregation rates by machine learning and feature selection

J Chem Phys. 2019 Aug 28;151(8):084106. doi: 10.1063/1.5113848.

Abstract

A novel data-based machine learning algorithm for predicting amyloid aggregation rates is reported in this paper. Based on a highly nonlinear projection from 16 intrinsic features of a protein and 4 extrinsic features of the environment to the protein aggregation rate, a feedforward fully connected neural network (FCN) with one hidden layer is trained on a dataset composed of 21 different kinds of amyloid proteins and tested on 4 rest proteins. FCN shows a much better performance than traditional algorithms, such as multivariable linear regression and support vector regression, with an average accuracy higher than 90%. Furthermore, by the correlation analysis and the principal component analysis, seven key features, folding energy, HP patterns for helix, sheet and helices cross membrane, pH, ionic strength, and protein concentration, are shown to constitute a minimum feature set for characterizing the amyloid aggregation kinetics.

MeSH terms

  • Amyloid / chemistry*
  • Kinetics
  • Machine Learning*
  • Neural Networks, Computer
  • Protein Aggregates*

Substances

  • Amyloid
  • Protein Aggregates