Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below

Predicting the subcellular localization of human proteins using machine learning and exploratory data analysis.

Author information

  • 1Department of Pharmaceutical Sciences, School of Pharmacy-Worcester, Massachusetts College of Pharmacy and Health Sciences, Worcester, MA 01608-1715, USA. george.acquaah-mensah@mcphs.edu

Abstract

Identifying the subcellular localization of proteins is particularly helpful in the functional annotation of gene products. In this study, we use Machine Learning and Exploratory Data Analysis (EDA) techniques to examine and characterize amino acid sequences of human proteins localized in nine cellular compartments. A dataset of 3,749 protein sequences representing human proteins was extracted from the SWISS-PROT database. Feature vectors were created to capture specific amino acid sequence characteristics. Relative to a Support Vector Machine, a Multi-layer Perceptron, and a Naive Bayes classifier, the C4.5 Decision Tree algorithm was the most consistent performer across all nine compartments in reliably predicting the subcellular localization of proteins based on their amino acid sequences (average Precision=0.88; average Sensitivity=0.86). Furthermore, EDA graphics characterized essential features of proteins in each compartment. As examples, proteins localized on the plasma membrane had higher proportions of hydrophobic amino acids; cytoplasmic proteins had higher proportions of neutral amino acids; and mitochondrial proteins had higher proportions of neutral amino acids and lower proportions of polar amino acids. These data showed that the C4.5 classifier and EDA tools can be effective for characterizing and predicting the subcellular localization of human proteins based on their amino acid sequences.

PMID:
16970551
[PubMed - indexed for MEDLINE]
PMCID:
PMC2709537
Free PMC Article

Images from this publication.See all images (6)Free text

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Icon for Elsevier Science Icon for PubMed Central
    Loading ...
    Write to the Help Desk