pmc logo image
Logo of narJournal URL: http://nar.oupjournals.org

Formats:

Nucleic Acids Res. 2008 November; 36(20): e136.
Published online 2008 November. doi: 10.1093/nar/gkn619.
PMCID: PMC2582614
Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species
KiYoung Lee,1,2,3,4 Han-Yu Chuang,1,5 Andreas Beyer,1,6 Min-Kyung Sung,7 Won-Ki Huh,7 Bonghee Lee,2 and Trey Ideker1,5*
1Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA, 2Center for Genomics and Proteomics, Lee Gil Ya Cancer and Diabetes Institute, Gachon University of Medicine and Science, Incheon 406-799, Republic of Korea, 3Structural Biology Laboratory, Salk Institute for Biology Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA, 4Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 305-701, Republic of Korea, 5Bioinformatics Program, University of California San Diego, La Jolla, CA 92093, USA, 6Biotechnology Center, Technische Universität, 01062 Dresden, Germany and 7School of Biological Sciences, Research Center for Functional Cellulomics, Institute of Microbiology, Seoul National University, Seoul 151-747, Republic of Korea
*To whom correspondence should be addressed. Tel: +1 858 822 4665; Fax: +1 858 822 4246; Email: trey/at/bioeng.ucsd.edu.
Received April 18, 2008; Revised August 13, 2008; Accepted September 11, 2008.
Abstract
The function of a protein is intimately tied to its subcellular localization. Although localizations have been measured for many yeast proteins through systematic GFP fusions, similar studies in other branches of life are still forthcoming. In the interim, various machine-learning methods have been proposed to predict localization using physical characteristics of a protein, such as amino acid content, hydrophobicity, side-chain mass and domain composition. However, there has been comparatively little work on predicting localization using protein networks. Here, we predict protein localizations by integrating an extensive set of protein physical characteristics over a protein's extended protein–protein interaction neighborhood, using a classification framework called ‘Divide and Conquer k-Nearest Neighbors’ (DC-kNN). These predictions achieve significantly higher accuracy than two well-known methods for predicting protein localization in yeast. Using new GFP imaging experiments, we show that the network-based approach can extend and revise previous annotations made from high-throughput studies. Finally, we show that our approach remains highly predictive in higher eukaryotes such as fly and human, in which most localizations are unknown and the protein network coverage is less substantial.