Format

Send to

Choose Destination
J Theor Biol. 2015 Nov 7;384:50-8. doi: 10.1016/j.jtbi.2015.07.038. Epub 2015 Aug 20.

Classification of signaling proteins based on molecular star graph descriptors using Machine Learning models.

Author information

1
Information and Communications Technologies Department, Faculty of Computer Science, University of A Coruna, Campus de Elviña s/n, 15071 A Coruña, Spain. Electronic address: carlos.fernandez@udc.es.
2
Information and Communications Technologies Department, Faculty of Computer Science, University of A Coruna, Campus de Elviña s/n, 15071 A Coruña, Spain. Electronic address: ruben.fcuinas@udc.es.
3
Bristol Genetic Epidemiology Laboratories, School of Social and Community Medicine, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS82BN, UK. Electronic address: j.seoane@bristol.ac.uk.
4
Information and Communications Technologies Department, Faculty of Computer Science, University of A Coruna, Campus de Elviña s/n, 15071 A Coruña, Spain. Electronic address: efernandez@udc.es.
5
Information and Communications Technologies Department, Faculty of Computer Science, University of A Coruna, Campus de Elviña s/n, 15071 A Coruña, Spain. Electronic address: julian@udc.es.
6
Information and Communications Technologies Department, Faculty of Computer Science, University of A Coruna, Campus de Elviña s/n, 15071 A Coruña, Spain; Department of Bioinformatics - BiGCaT, Maastricht University, P.O. Box 616, UNS50 Box 19, NL-6200 MD Maastricht, The Netherlands. Electronic address: crm.publish@gmail.com.

Abstract

Signaling proteins are an important topic in drug development due to the increased importance of finding fast, accurate and cheap methods to evaluate new molecular targets involved in specific diseases. The complexity of the protein structure hinders the direct association of the signaling activity with the molecular structure. Therefore, the proposed solution involves the use of protein star graphs for the peptide sequence information encoding into specific topological indices calculated with S2SNet tool. The Quantitative Structure-Activity Relationship classification model obtained with Machine Learning techniques is able to predict new signaling peptides. The best classification model is the first signaling prediction model, which is based on eleven descriptors and it was obtained using the Support Vector Machines-Recursive Feature Elimination (SVM-RFE) technique with the Laplacian kernel (RFE-LAP) and an AUROC of 0.961. Testing a set of 3114 proteins of unknown function from the PDB database assessed the prediction performance of the model. Important signaling pathways are presented for three UniprotIDs (34 PDBs) with a signaling prediction greater than 98.0%.

KEYWORDS:

Feature selection; SVM-RFE; Signal transduction pathway; Topological indices

PMID:
26297890
DOI:
10.1016/j.jtbi.2015.07.038
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center