![]() | ![]() |
Formats:
|
||||||||||
Copyright © 2008 The Author(s) Hubba: hub objects analyzer—a framework of interactome hubs identification for network biology 1Institute of Information Science, Academia Sinica, No. 128 Yan-Chiu-Yuan Rd., Sec. 2, Taipei 115, 2Division of Biostatistics and Bioinformatics, National Health Research Institutes. No. 35 Keyan Rd. Zhunan, Miaoli County 350, 3Institute of Fishery Science, College of Life Science, National Taiwan University, No. 1, Roosevelt Rd. Sec 4, Taipei and 4Department of Computer Science and Information Engineering, National Central University, No. 300, Jung-da Rd, Chung-li, Tao-yuan 320, Taiwan *To whom correspondence should be addressed. Phone: +886 3 4227151 4461, Fax: +886 3 4222681, Email: hocw/at/csie.ncu.edu.tw Correspondence may also be addressed to Ming-Tat Ko.+886 2 27883799 1821 +886 2 27824814 Email: mtko/at/iis.sinica.edu.tw Received February 11, 2008; Revised April 17, 2008; Accepted April 20, 2008. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract One major task in the post-genome era is to reconstruct proteomic and genomic interacting networks using high-throughput experiment data. To identify essential nodes/hubs in these interactomes is a way to decipher the critical keys inside biochemical pathways or complex networks. These essential nodes/hubs may serve as potential drug-targets for developing novel therapy of human diseases, such as cancer or infectious disease caused by emerging pathogens. Hub Objects Analyzer (Hubba) is a web-based service for exploring important nodes in an interactome network generated from specific small- or large-scale experimental methods based on graph theory. Two characteristic analysis algorithms, Maximum Neighborhood Component (MNC) and Density of Maximum Neighborhood Component (DMNC) are developed for exploring and identifying hubs/essential nodes from interactome networks. Users can submit their own interaction data in PSI format (Proteomics Standards Initiative, version 2.5 and 1.0), tab format and tab with weight values. User will get an email notification of the calculation complete in minutes or hours, depending on the size of submitted dataset. Hubba result includes a rank given by a composite index, a manifest graph of network to show the relationship amid these hubs, and links for retrieving output files. This proposed method (DMNC || MNC) can be applied to discover some unrecognized hubs from previous dataset. For example, most of the Hubba high-ranked hubs (80% in top 10 hub list, and >70% in top 40 hub list) from the yeast protein interactome data (Y2H experiment) are reported as essential proteins. Since the analysis methods of Hubba are based on topology, it can also be used on other kinds of networks to explore the essential nodes, like networks in yeast, rat, mouse and human. The website of Hubba is freely available at http://hub.iis.sinica.edu.tw/Hubba. INTRODUCTION Proteins control and mediate many biological activities via interactions with other protein partners. Information of protein networks derived from protein interactions can serve as a good starting point for understanding the molecular machinery. Besides, elucidating protein interacting partnerships may help annotate unknown proteins and provide further insight into biological networks. Various experimental strategies are available for identifying protein interactions. While the conducive for high-throughput technology on the yeast two-hybrid system, performed in bacteria, yeast, worms, flies and more recently, mice and humans (1–4), enable us to characterize physical protein–protein interactions in the genome-wide scale (5,6). Many interactomes derived from such approaches were collected by different databases, for example, Biomolecular Interaction Network Database (BIND) (7), the Database of Interacting Proteins (DIP) (8), IntAct (9), the Munich Information Center for Protein Sequences (MIPS) (10), STRING (11), REACTOME (12) and some other databases with similar purpose. Besides, some interesting interactomes of host–pathogens (4,13,14) and carcinogenesis (2), were also published recently. A protein interaction network is naturally complicate and far from a random network. Using the network characters, such as the degree distribution, clustering, diameter and relative graphlet frequency distribution, information can be extracted from a protein–protein network (15). To identify essential nodes/hubs the protein networks is a way to decipher the critical key controllers inside biochemical pathways or complex networks. Combining the gene-expression data with a high-quality yeast protein–protein interaction dataset, Han et al. (16) deliberated on the network dynamics in protein–protein interaction networks and revealed two types of hubs. One of them is more likely to be the module organizers and the other to be the module connectors (17). These essential nodes/hubs may serve as candidates of drug-targets for developing novel therapy of human diseases, such as cancer or infectious disease caused by emerging pathogens. There are several approaches trying to identify motif/functional modules, while few approaches were attempted to decipher the hub/essential proteins directly. For example, CFinder is a tool for predicting the function of a single protein and for discovering novel protein modules (18). Other similar tools like mfinder (19), FANMOD (20) and MAVisto (21) are designed for network motifs detection. Idowu et al. (22) use degree and BottleNeck methods to identify the possible-essential proteins in the PPI network of Bacillus Subtilis. Here, we proposed a framework combined with self-developed algorithms and integrated platform named as Hub Objects Analyzer (Hubba) to decipher hub/essential proteins from the user-defined protein interaction networks in graphic mode. Hubba is a web-based service for exploring important nodes in an interactome network generated from specific small- or large-scale experimental methods based on graph theory. In this website, we explore the essential nodes by six characteristic analysis methods on protein–protein interaction network, including Degree, BottleNeck (BN), Edge Percolation Component (EPC), Subgraph Centrality (SC) and two characteristic analysis algorithms developed by us: Maximum Neighborhood Component (MNC) and Density of Maximum Neighborhood Component (DMNC). A double screening scheme (DSS) for exploring and identifying hubs/essential nodes from interactome networks is proposed. Hubba result includes a rank given by a composite index in DSS, a manifest graph of network to show the relationship amid these hubs via SVG viewer (http://www.adobe.com/svg/), and links of results calculated by all algorithms mentioned above. Analyzing the yeast protein interactome data (Y2H experiment) with list of essential proteins from Saccharomyces Genome Database (SGD, http://www.yeastgenome.org/), most of the Hubba high-ranked hubs (80% in top 10 hub list, and >70% in top 40 hub list) from are reported as essential proteins. Since the analysis methods of Hubba are based on topology, it can also be used on other kinds of networks to explore the essential nodes, like networks in yeast, rat, mouse and human. The clues revealed from network topological analysis will provide a new sight to experimental biologists. SYSTEM IMPLEMENTATION The Hubba system is built in an open-source structure: Linux (Mandriva 2007, operating system), Apache (web server), PHP (html-embedded scripting language), PostgreSQL (relational database), XMLMakerFlattener (translate data format), Graphviz (graph generator), BGL,ã LAPACK and LAPACK++ (topology calculation). The framework of whole system is depicted in Figure 1
Algorithms used in Hubba Hubba explores the possibly essential proteins in the interaction network by six topology-based scoring methods and a DSS.
Job processing and result display The Hubba system separates a job into two modes, ‘user mode’ and ‘system mode’ (Figure 1 All input data in a query are parsed and stored in a temporary database for the following analysis. Hubba will conduct six topological methods and the double screening scheme to submitted dataset and acquire ranking score for each node in the submitted network. The ranking score in Hubba is a composite index calculated by the DSS (DMNC || MNC) as described in the algorithm sections. After all calculations were completed, the process will be directed back to ‘user mode’ for outcome display. There are three major options in the result page, ‘Hub Selector and Topology Moderator’, ‘Local Network Graph with Hub List’ and ‘Download Area’. In ‘Hub Selector and Topology Moderator’, users can select the top of hubs or search for particular nodes to browse the relationship among these nodes in the submitted network. Users also can manipulate on the advanced options, ‘Check the first-stage nodes’ to show the neighbors of the top/particular nodes, and ‘Display the shortest path’ to mark the shortest path distance between nodes, respectively. In this way, the connectivity among hubs can be easy identified. An output graph in PNG format is generated by Graphviz and is shown directly in the result page of ‘Local Network Graph with Hub List’. For those query starting from the standard PSI-MI format, the biological functions related to those identified hubs can be shown in SVG viewer. All the output results, including network images and the ranking scores by the DSS and six scoring methods, can be retrieved from the ‘Download Area’. We also provide the output in gml and EPS format, which can be open in Cytoscape (http://www.cytoscape.org/) and edited with standard linux tools for further analysis. Normally, an analysis job is completed within a few minutes and the result is pushed back to the same web browser window automatically. If a job takes longer than expected, the user can save the link as a bookmark and revisits Hubba later, or follows the link provided in the notice mail to retrieve the analysis results. RESULTS AND CONCLUSION The main ideas of the double screening scheme are to select methods catching diverse characters and to include most essential proteins. Firstly, the overlapping of n top lists from different methods is studied. For all the six methods applied to the protein–protein interaction dataset yeast20070107.lst (http://dip.doe-mbi.ucla.edu/), the overlaps in the top 100 ranked proteins of any two scoring methods are expressed in percentage (Supplementary Table S1). Among all methods, DMNC are found to be the one that shares the least proteins with the others. Accordingly, the topological characters extracted by DMNC may differ from those by the other methods. Second, we evaluate the performance of the six scoring method by the coverage of yeast essential proteins. As shown in Table 1, DMNC has the highest hit rate on the essential protein list. Therefore, we choose DMNC as the first method in the DSS. The second method of the double screen scheme is chosen on the same criteria. Among the five methods, MNC is the best mate of DMNC. The scheme improves the hit rate (Table 1, last column).
Hubba is constructed as a user-friendly interface for dataset uploading and result displaying. After the analysis process is completed, Hubba provides a community graph of the top n ranked (n ≤ 100) hub/essential proteins with the identifier provided in the input dataset (Figure 2
Identifying hubs or fragile motifs are very important in network biology. For example, based on the overview of the interaction among human proteins and proteins from 190 pathogen stains is revealed that both viral and bacterial pathogens tend to interact with hub and bottlenecks in the human PPI network (28). Chuang et al. (29) applied a protein-network-based approach to analyze the expression profiles of the two cohorts of breast cancer patients. They found several notorious cancer markers, such as P53, KRAS, HRAS, HER-2/neu and PIK3CA, are located on the interconnecting bottleneck of many expression-responsive genes, while these markers could not serve as indicators of the disease state using gene-expression data alone. Feldman and his co-workers (30) conclude some network properties of human inheritable diseases. They found that genes and proteins harboring variation causing the same disease phenotype tend to form directly connected clusters. A similar purpose for identifying disease-associated proteins can be found in Hubba, which accepts a query of an interested list on a user-defined network and provides output for the shortest path among them. In this way, nodes in the paths may serve as candidates related to the disorder the query list involved. The topological analysis like Hubba is dependent on the completion and accuracy of the input interactome dataset. While this platform provides a chance to build a network related to the scenario the customized interaction dataset derived. Therefore, the secrets hidden inside the networks with specific spatiotemporal scenarios will be deciphered and sketched. We hope this approach can lead to a new strategy for exploring the mechanism of cancer formation and pathogens infection. And it may lead to new therapies and novel insights in understanding basic mechanisms controlling normal cellular processes and disease pathologies. [Supplementary Data]
ACKNOWLEDGEMENTS The authors would like to thank National Science Council (NSC)/National Research Program of Genomic Medicine (NRPGM), Taiwan, for financially supporting this research through NSC 96-3112-B-001-002 to C-.Y.L. and NSC 95-2221-E-008 -055 to C-.W.H. Funding to pay the Open Access publication charges for this article was provided by NSC 96-3112-B-001-002 to C-.Y. L. Conflict of interest statement. None declared. REFERENCES 1. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. PNAS. 2001;98:4569–4574. [PubMed] 2. Jonsson PF, Bates PA. Global topological features of cancer proteins in the human interactome. Bioinformatics. 2006;22:2291–2297. [PubMed] 3. Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature. 2005;437:1173–1178. [PubMed] 4. Uetz P, Dong YA, Zeretzke C, Atzler C, Baiker A, Berger B, Rajagopala SV, Roupelieva M, Rose D, Fossum E, et al. Herpesviral protein networks and their interaction with the human proteome. Science. 2006;311:239–242. [PubMed] 5. Lin CY, Chen CL, Cho CS, Wang LM, Chang CM, Chen PY, Lo CZ, Hsiung CA. hp-DPI: Helicobacter pylori database of protein interactomes—embracing experimental and inferred interactions. Bioinformatics. 2005;21:1288–1290. [PubMed] 6. Lin C-Y, Chen S-H, Cho C-S, Chen C-L, Lin F-K, Lin C-H, Chen P-Y, Lo C-Z, Hsiung CA. Fly-DPI: database of protein interactomes for D. melanogaster in the approach of systems biology. BMC Bioinform. 2006;7:S18. 7. Bader GD, Betel D, Hogue CW. BIND: the biomolecular interaction network database. Nucleic Acids Res. 2003;31:248–250. [PubMed] 8. Deane CM, Salwinski L, Xenarios I, Eisenberg D. Protien interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell Proteomics. 2002;5:349–356. [PubMed] 9. Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, et al. IntAct—open source resource for molecular interaction data. Nucleic Acids Res. 2007;35:D561–D565. [PubMed] 10. Schoof H, Spannagl M, Yang L, Ernst R, Gundlach H, Haase D, Haberer G, Mayer KF. Munich information center for protein sequences plant genome resources: a framework for integrative and comparative analyses 1(W). Plant Physiol. 2005;138:1301–1309. [PubMed] 11. von Mering C, Jensen LJ, Kuhn M, Chaffron S, Doerks T, Kruger B, Snel B, Bork P. STRING 7—recent developments in the integration and prediction of protein interactions. Nucleic Acids Res. 2007;35:D358–D362. [PubMed] 12. Vastrik I, D’Eustachio P, Schmidt E, Joshi-Tope G, Gopinath G, Croft D, de Bono B, Gillespie M, Jassal B, Lewis S, et al. Reactome: a knowledge base of biologic pathways and processes. Genome Biol. 2007;8:R39. [PubMed] 13. Dyer MD, Murali TM, Sobral BW. Computational prediction of host-pathogen protein protein interactions. Bioinformatics. 2007;23:i159–i166. [PubMed] 14. Calderwood MA, Venkatesan K, Xing L, Chase MR, Vazquez A, Holthaus AM, Ewence AE, Li N, Hirozane-Kishikawa T, Hill DE, et al. Epstein-Barr virus and virus human protein interaction maps. Proc. Natl Acad. Sci. USA. 2007;104:7606–7611. [PubMed] 15. Przulj N, Wigle DA, Jurisica I. Functional topology in a network of protein interactions. Bioinformatics. 2004;20:340–348. [PubMed] 16. Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP, et al. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature. 2004;430:88–93. [PubMed] 17. Ekman D, Sara L, Åsa KB, Arne E. What properties characterize the hub protein of the protein protein interaction network of Saccharomyces cerevisiae. Genome Biol. 2006;7:R45. [PubMed] 18. Adamcsek B, Palla G, Farkas IJ, Derenyi I, Vicsek T. CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics. 2006;22:1021–1023. [PubMed] 19. Kashtan N, Itzkovitz S, Milo R, Alon U. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics. 2004;20:1746–1758. [PubMed] 20. Wernicke S, Rasche F. FANMOD: a tool for fast network motif detection. Bioinformatics. 2006;22:1152–1153. [PubMed] 21. Schreiber F, Schwobbermeyer H. MAVisto: a tool for the exploration of network motifs. Bioinformatics. 2005;21:3572–3574. [PubMed] 22. Idowu OC, Lynden SJ, Young MP, Andras P. Bacillus Subtilis protein interaction network analysis. In 2004 IEEE Computational Systems Bioinformatics Conference (CSB'04), 2004:623–625. 23. Jeong H, Mason SP, Barabási AL, Oltvai ZN. Lethality and centrality in protein networks. Nature. 2001;411:41–42. [PubMed] 24. Yu H, Kim PM, Sprecher E, Trifonov V, Gerstein M. The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics. PLoS Comput. Biol. 2007;3:e59. [PubMed] 25. Chin C-S, Manoj PS. Global snapshot of a protein interaction network—a percolation based approach. Bioinformatics. 2003;19:2413–2419. [PubMed] 26. Estrada E, Rodríguez-Velázquez JA. Subgraph centrality in complex network. Phys. Rev. 2005;71:056103. 27. Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, Bangham R, Benito R, Boeke JD, Bussey H, et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science. 1999;285:901–906. [PubMed] 28. Dyer MD, Murali TM, Sobral BW. The landscape of human proteins interacting with viruses and other pathogens. PLoS Pathog. 2008;4:e32. [PubMed] 29. Chuang HY, Lee E, Liu YT, Lee D, Ideker T. Network-based classification of breast cancer metastasis. Mol. Syst. Biol. 2007;3:140. [PubMed] 30. Feldman I, Rzhetsky A, Vitkup D. Network properties of genes harboring inherited disease mutations. Proc. Natl Acad. Sci. USA. 2008;105:4323–4328. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||
Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4569-74.
[Proc Natl Acad Sci U S A. 2001]Bioinformatics. 2006 Sep 15; 22(18):2291-7.
[Bioinformatics. 2006]Nature. 2005 Oct 20; 437(7062):1173-8.
[Nature. 2005]Science. 2006 Jan 13; 311(5758):239-42.
[Science. 2006]Bioinformatics. 2005 Apr 1; 21(7):1288-90.
[Bioinformatics. 2005]Bioinformatics. 2004 Feb 12; 20(3):340-8.
[Bioinformatics. 2004]Nature. 2004 Jul 1; 430(6995):88-93.
[Nature. 2004]Genome Biol. 2006; 7(6):R45.
[Genome Biol. 2006]Bioinformatics. 2006 Apr 15; 22(8):1021-3.
[Bioinformatics. 2006]Bioinformatics. 2004 Jul 22; 20(11):1746-58.
[Bioinformatics. 2004]Bioinformatics. 2006 May 1; 22(9):1152-3.
[Bioinformatics. 2006]Bioinformatics. 2005 Sep 1; 21(17):3572-4.
[Bioinformatics. 2005]Nature. 2001 May 3; 411(6833):41-2.
[Nature. 2001]Bioinformatics. 2004 Feb 12; 20(3):340-8.
[Bioinformatics. 2004]PLoS Comput Biol. 2007 Apr 20; 3(4):e59.
[PLoS Comput Biol. 2007]Bioinformatics. 2003 Dec 12; 19(18):2413-9.
[Bioinformatics. 2003]Science. 1999 Aug 6; 285(5429):901-6.
[Science. 1999]Nature. 2001 May 3; 411(6833):41-2.
[Nature. 2001]Bioinformatics. 2004 Feb 12; 20(3):340-8.
[Bioinformatics. 2004]PLoS Comput Biol. 2007 Apr 20; 3(4):e59.
[PLoS Comput Biol. 2007]Bioinformatics. 2003 Dec 12; 19(18):2413-9.
[Bioinformatics. 2003]Nature. 2001 May 3; 411(6833):41-2.
[Nature. 2001]Bioinformatics. 2004 Feb 12; 20(3):340-8.
[Bioinformatics. 2004]PLoS Comput Biol. 2007 Apr 20; 3(4):e59.
[PLoS Comput Biol. 2007]Bioinformatics. 2003 Dec 12; 19(18):2413-9.
[Bioinformatics. 2003]PLoS Pathog. 2008 Feb 8; 4(2):e32.
[PLoS Pathog. 2008]Mol Syst Biol. 2007; 3():140.
[Mol Syst Biol. 2007]Proc Natl Acad Sci U S A. 2008 Mar 18; 105(11):4323-8.
[Proc Natl Acad Sci U S A. 2008]