An ensemble classifier for eukaryotic protein subcellular location prediction using gene ontology categories and amino acid hydrophobicity

PLoS One. 2012;7(1):e31057. doi: 10.1371/journal.pone.0031057. Epub 2012 Jan 30.

Abstract

With the rapid increase of protein sequences in the post-genomic age, it is challenging to develop accurate and automated methods for reliably and quickly predicting their subcellular localizations. Till now, many efforts have been tried, but most of which used only a single algorithm. In this paper, we proposed an ensemble classifier of KNN (k-nearest neighbor) and SVM (support vector machine) algorithms to predict the subcellular localization of eukaryotic proteins based on a voting system. The overall prediction accuracies by the one-versus-one strategy are 78.17%, 89.94% and 75.55% for three benchmark datasets of eukaryotic proteins. The improved prediction accuracies reveal that GO annotations and hydrophobicity of amino acids help to predict subcellular locations of eukaryotic proteins.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acids / metabolism*
  • Databases, Protein
  • Eukaryotic Cells / metabolism*
  • Genes / genetics*
  • Humans
  • Hydrophobic and Hydrophilic Interactions*
  • Molecular Sequence Annotation / methods*
  • Proteins / metabolism*
  • Subcellular Fractions / metabolism
  • Support Vector Machine

Substances

  • Amino Acids
  • Proteins