Format

Send to

Choose Destination
Int J Mol Sci. 2016 Jul 27;17(8). pii: E1215. doi: 10.3390/ijms17081215.

A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces.

Author information

1
Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, Estrada Nacional 10 (ao km 139,7), 2695-066 Bobadela LRS, Portugal. ritamelo@ctn.ist.utl.pt.
2
CNC-Center for Neuroscience and Cell Biology; Rua Larga, Faculdade de Medicina, Polo I, 1ºandar, Universidade de Coimbra, 3004-504 Coimbra, Portugal. ritamelo@ctn.ist.utl.pt.
3
Department of Genetics and Genomics and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. robert.fieldhouse@mssm.edu.
4
REQUIMTE (Rede de Química e Tecnologia), Faculdade de Ciências da Universidade do Porto, Departamento de Química e Bioquímica, Rua do Campo Alegre, 4169-007 Porto, Portugal. asmelo@fc.up.pt.
5
Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, Estrada Nacional 10 (ao km 139,7), 2695-066 Bobadela LRS, Portugal. jgalamba@ctn.tecnico.ulisboa.pt.
6
REQUIMTE (Rede de Química e Tecnologia), Faculdade de Ciências da Universidade do Porto, Departamento de Química e Bioquímica, Rua do Campo Alegre, 4169-007 Porto, Portugal. ncordeir@fc.up.pt.
7
Department of Genetics and Genomics and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. zeynep.gumus@gmail.com.
8
CMUP/FCUP, Centro de Matemática da Universidade do Porto, Faculdade de Ciências, Rua do Campo Alegre, 4169-007 Porto, Portugal. Jpcosta@fc.up.pt.
9
Bijvoet Center for Biomolecular Research, Faculty of Science-Chemistry, Utrecht University, Utrecht 3584CH, The Netherlands. a.m.j.j.bonvin@uu.nl.
10
CNC-Center for Neuroscience and Cell Biology; Rua Larga, Faculdade de Medicina, Polo I, 1ºandar, Universidade de Coimbra, 3004-504 Coimbra, Portugal. irina.moreira@cnc.uc.pt.
11
Bijvoet Center for Biomolecular Research, Faculty of Science-Chemistry, Utrecht University, Utrecht 3584CH, The Netherlands. irina.moreira@cnc.uc.pt.

Abstract

Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set.

KEYWORDS:

Solvent Accessible Surface Area (SASA); evolutionary sequence conservation; hot-spots; machine learning; protein-protein interfaces

PMID:
27472327
PMCID:
PMC5000613
DOI:
10.3390/ijms17081215
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Multidisciplinary Digital Publishing Institute (MDPI) Icon for PubMed Central
Loading ...
Support Center