Send to

Choose Destination
Bioinformatics. 2010 Aug 1;26(15):1841-8. doi: 10.1093/bioinformatics/btq302. Epub 2010 Jun 6.

Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites.

Author information

National Institute of Biomedical Innovation, Osaka, Japan.



The limited availability of protein structures often restricts the functional annotation of proteins and the identification of their protein-protein interaction sites. Computational methods to identify interaction sites from protein sequences alone are, therefore, required for unraveling the functions of many proteins. This article describes a new method (PSIVER) to predict interaction sites, i.e. residues binding to other proteins, in protein sequences. Only sequence features (position-specific scoring matrix and predicted accessibility) are used for training a Naïve Bayes classifier (NBC), and conditional probabilities of each sequence feature are estimated using a kernel density estimation method (KDE).


The leave-one out cross-validation of PSIVER achieved a Matthews correlation coefficient (MCC) of 0.151, an F-measure of 35.3%, a precision of 30.6% and a recall of 41.6% on a non-redundant set of 186 protein sequences extracted from 105 heterodimers in the Protein Data Bank (consisting of 36 219 residues, of which 15.2% were known interface residues). Even though the dataset used for training was highly imbalanced, a randomization test demonstrated that the proposed method managed to avoid overfitting. PSIVER was also tested on 72 sequences not used in training (consisting of 18 140 residues, of which 10.6% were known interface residues), and achieved an MCC of 0.135, an F-measure of 31.5%, a precision of 25.0% and a recall of 46.5%, outperforming other publicly available servers tested on the same dataset. PSIVER enables experimental biologists to identify potential interface residues in unknown proteins from sequence information alone, and to mutate those residues selectively in order to unravel protein functions.


Freely available on the web at

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center