Send to

Choose Destination
Comb Chem High Throughput Screen. 2019 Nov 28. doi: 10.2174/1386207322666191129113508. [Epub ahead of print]

Prediction of Citrullination Sites on the Basis of mRMR Method and SNN.

Author information

College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.



Citrullination, an important post-translational modification of proteins, alters the molecular weight and electrostatic charge of the protein side chains. Citrulline in protein sequences is catalyzed by a class of peptidyl arginine deiminases (PADs). Dependent on Ca2+, PADs include five isozymes: PAD 1, 2, 3, 4/5, and 6. Citrullinated proteins have been identified in many biological and pathological processes. Among them, abnormal protein citrullination modification can lead to serious human diseases, including multiple sclerosis and rheumatoid arthritis.


It is necessary and important to identify the citrullination sites in protein sequences. The accurate identification of citrullination sites may contribute to the studies on the molecular functions and pathological mechanisms of related diseases.


In this study, after encoded training set (containing 116 positive and 348 negative samples) into feature matrix, we used the mRMR method to analyze the 941-dimensional features and sorted them on the basis of their importance. Then, we proposed a predictive model based on a self-normalizing neural networks (SNN) to predict the citrullination sites in protein sequences. Incremental feature selection (IFS) and 10-fold cross-validation was used as the model evaluation method. We selected three classical machine learning models, namely, random forest, support vector machine, and k-nearest neighbor algorithm, and compared them with the SNN prediction model using the same evaluation methods. SNN may be a best tool for citrullination site prediction. The maximum value of Matthews correlation coefficient (MCC) reached 0.672404 on the basis of the optimal classifier of SNN.


The results showed that the SNN-based prediction methods performed better when evaluated by some common metrics, such as MCC, accuracy, and F1-Measure. SNN prediction model also achieved better balance in the classification and recognition of positive and negative samples from datasets compared with the other three models.


IFS (incremental feature selection); PTM (post-translational modification); SNN (self-normalizing neural network); citrullination site; mRMR (minimum redundancy maximum relevance)

Supplemental Content

Full text links

Icon for Bentham Science Publishers Ltd.
Loading ...
Support Center