Subcellular localization prediction through boosting association rules

Yongwook Yoon; Gary Geunbae Lee

doi:10.1109/TCBB.2011.131

Subcellular localization prediction through boosting association rules

IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):609-18. doi: 10.1109/TCBB.2011.131. Epub 2011 Sep 27.

Authors

Yongwook Yoon¹, Gary Geunbae Lee

Affiliation

¹ Pohang University of Science and Technology, Pohang.

PMID: 21968957
DOI: 10.1109/TCBB.2011.131

Abstract

Computational methods for predicting protein subcellular localization have used various types of features, including N-terminal sorting signals, amino acid compositions, and text annotations from protein databases. Our approach does not use biological knowledge such as the sorting signals or homologues, but use just protein sequence information. The method divides a protein sequence into short $k$-mer sequence fragments which can be mapped to word features in document classification. A large number of class association rules are mined from the protein sequence examples that range from the N-terminus to the C-terminus. Then, a boosting algorithm is applied to those rules to build up a final classifier. Experimental results using benchmark datasets show our method is excellent in terms of both the classification performance and the test coverage. The result also implies that the $k$-mer sequence features which determine subcellular locations do not necessarily exist in specific positions of a protein sequence. Online prediction service implementing our method is available at http://isoft.postech.ac.kr/research/BCAR/subcell.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Animals
Cluster Analysis
Computational Biology / methods*
Databases, Protein
Intracellular Space / chemistry*
Pattern Recognition, Automated
Plant Proteins
Proteins / chemistry*
Proteins / classification*
Sequence Analysis, Protein / methods*
Support Vector Machine

Substances

Plant Proteins
Proteins