Send to:

Choose Destination
See comment in PubMed Commons below
BMC Bioinformatics. 2007 Aug 17;8:300.

Efficacy of different protein descriptors in predicting protein functional families.

Author information

  • 1Department of Pharmacy, National University of Singapore, Blk S16, Level 8, 08-14, 3 Science Drive 2, Singapore 117543, Singapore.



Sequence-derived structural and physicochemical descriptors have frequently been used in machine learning prediction of protein functional families, thus there is a need to comparatively evaluate the effectiveness of these descriptor-sets by using the same method and parameter optimization algorithm, and to examine whether the combined use of these descriptor-sets help to improve predictive performance. Six individual descriptor-sets and four combination-sets were evaluated in support vector machines (SVM) prediction of six protein functional families.


The performance of these descriptor-sets were ranked by Matthews correlation coefficient (MCC), and categorized into two groups based on their performance. While there is no overwhelmingly favourable choice of descriptor-sets, certain trends were found. The combination-sets tend to give slightly but consistently higher MCC values and thus overall best performance such that three out of four combination-sets show slightly better performance compared to one out of six individual descriptor-sets.


Our study suggests that currently used descriptor-sets are generally useful for classifying proteins and the prediction performance may be enhanced by exploring combinations of descriptors.

[PubMed - indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for BioMed Central Icon for PubMed Central
    Loading ...
    Write to the Help Desk