Format

Send to

Choose Destination
In Silico Biol. 2009;9(1-2):35-44.

SubCellProt: predicting protein subcellular localization using machine learning approaches.

Author information

1
Center for Pharmacoinformatics, National Institute of Pharmaceutical Education and Research S.A.S. Nagar, Sector 67, S.A.S Nagar, Punjab 160 062, India. prabhagarg@niper.ac.in

Abstract

High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner. SubCellProt is available at www.databases.niper.ac.in/SubCellProt.

PMID:
19537160
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for IOS Press
Loading ...
Support Center