![]() | ![]() |
Formats:
|
||||||||||||
Copyright © 2009, EMBO and Nature Publishing Group Tissue specificity and the human protein interaction network 1EMBL-CRG Systems Biology Unit, Centre for Genomic Regulation, UPF, Barcelona, Spain 2ICREA, Centre for Genomic Regulation, UPF, Barcelona, Spain aEMBL-CRG Systems Biology Unit, Centre for Genomic Regulation, Dr Aiguader 88, Barcelona 8003, Spain. Tel.: +34 93 316 0194; Fax: +34 93 316 0099; Email: ben.lehner/at/crg.es Received October 23, 2008; Accepted February 23, 2009. This is an open-access article distributed under the terms of the Creative Commons Attribution Licence, which permits distribution and reproduction in any medium, provided the original author and source are credited. Creation of derivative works is permitted but the resulting work may be distributed only under the same or similar licence to this one. This licence does not permit commercial exploitation without specific permission. Abstract A protein interaction network describes a set of physical associations that can occur between proteins. However, within any particular cell or tissue only a subset of proteins is expressed and so only a subset of interactions can occur. Integrating interaction and expression data, we analyze here this interplay between protein expression and physical interactions in humans. Proteins only expressed in restricted cell types, like recently evolved proteins, make few physical interactions. Most tissue-specific proteins do, however, bind to universally expressed proteins, and so can function by recruiting or modifying core cellular processes. Conversely, most ‘housekeeping' proteins that are expressed in all cells also make highly tissue-specific protein interactions. These results suggest a model for the evolution of tissue-specific biology, and show that most, and possibly all, ‘housekeeping' proteins actually have important tissue-specific molecular interactions. Keywords: human, protein interaction networks, tissue-specific evolution Introduction Nearly all processes in biology are dependent on the precise physical interactions among many individual proteins. These range from the maintenance of cellular architecture and the propagation of the genetic material, to the ability of cells to process and respond to environmental information. Defining a near-complete map of the physical interactions that can occur between human proteins—the human protein ‘interactome'—is an important ambition of current research. Similar to the sequence of the human genome, the human interactome serves as a resource for researchers and can be used to understand how proteins are organized to perform functions within a cell (Bork et al, 2004; Cusick et al, 2005). Protein interactome mapping projects were pioneered in model organisms (Uetz et al, 2000; Walhout et al, 2000; Ito et al, 2001; Ho et al, 2002; Li et al, 2004; Gavin et al, 2006; Krogan et al, 2006), with initial efforts in humans focused on particular pathways or genomic regions (Bouwmeester et al, 2004; Lehner and Sanderson, 2004; Lehner et al, 2004; Jeronimo et al, 2007). More recently, the cloning of large sets of human open reading frames and improvements in interaction assays have allowed these efforts to be expanded by an order of magnitude to the scale of the human proteome (Rual et al, 2005; Stelzl et al, 2005; Ewing et al, 2007). These data, combined with extensive efforts to collate known interactions from the scientific literature (Bader et al, 2001; Xenarios et al, 2002; Pagel et al, 2005; Persico et al, 2005; Stark et al, 2006; Kerrien et al, 2007; Vastrik et al, 2007; Ruepp et al, 2008), mean that there is now a reasonably extensive resource of known human protein interactions (Hart et al, 2006). A global interactome network provides an overview of all of the physical interactions that can occur between human proteins. However, very little is known about when and where each of these interactions can occur. Within any particular cell or tissue of the human body not all protein interactions can occur. Most simply, if two genes are not expressed in a cell, then an interaction between their protein products cannot occur. In unicellular organisms, one approach that has been used to investigate the dynamics of interaction networks between cellular states has been to integrate interactome data with expression data. This approach has been used to identify co-regulated interaction modules (Ihmels et al, 2002; Komurov and White, 2007) or to investigate the relationships between interaction network topology and gene co-expression (Han et al, 2004). Additional studies have used gene expression (Luscombe et al, 2004; de Lichtenberg et al, 2005) or functional information (Rachlin et al, 2006) to investigate the cellular conditions (or ‘context') under which interactions can occur, and to distinguish between condition-dependent and condition-independent interactions. In the present study, we apply a similar approach to the human protein interaction network, using global gene expression data to identify the human cells and tissues in which each interaction can or cannot occur. By performing this analysis, we are able to investigate the relationship between the tissue specificity of a protein and its number of interaction partners. Moreover, and strikingly, we find extensive communication between universally expressed proteins and those with tissue-specific expression. Even the most tissue-specific proteins normally interact directly with components of the core cellular machinery. Conversely, nearly all universally expressed ‘housekeeping' proteins have protein interactions that can only occur in a restricted subset of cells. Our results suggest a model for the evolution of tissue-specific functions through the modification and re-use of core cellular processes, and that most ‘housekeeping' proteins should probably be considered as important for tissue-specific processes. Results Construction of a global human protein interaction network To construct a global human physical protein interaction network, we integrated data from 21 different sources to define a network of 80 922 physical interactions that can occur between 10 229 human proteins. We only included interactions supported by at least one piece of direct experimental evidence demonstrating physical association between two human proteins (see Materials and methods; Supplementary Table 1). Moreover, to account for differences in interaction assay reliability, throughout this work, we also consider a high-confidence subset of this global network that consists of interactions reported in at least two independent primary research publications. There are a total of 13 102 of these multiple publication-supported interactions that connect 4750 human proteins. Determining the tissue specificity of human protein interactions We then used gene expression data (Su et al, 2004) to determine the cells and tissues of the human body in which each of these interactions can occur (Figure 1A
Tissue specific and recently evolved proteins make few protein interactions We first examined the relationship between the tissue specificity of a protein and the number of interactions that it makes (a protein's interaction degree). We find that more tissue-specific proteins make fewer interactions than widely expressed proteins (Figure 1B The most tissue-specific proteins normally interact with core cellular components We next analyzed the extent to which tissue-specific proteins interact with the most widely expressed proteins. We find that even when only considering the most tissue-restricted proteins (proteins expressed in 10/79 tissues), most of them are known to interact directly with universally expressed human proteins (Figure 2A
Most universally expressed proteins have tissue-specific protein interactions Constitutively expressed proteins are often considered as important for ‘housekeeping' biological processes that are required in all cells. However, nearly all of the most widely expressed proteins have interactions with other proteins that are not themselves universally expressed (Figure 2B Proteins that themselves have restricted expression patterns also have many interactions that can only occur in a subset of the tissues in which they are expressed (Figure 2C Extensive re-use of housekeeping proteins for tissue-specific biological processes To further illustrate how housekeeping proteins are widely re-used for tissue-specific biological processes, we considered neuronal protein complexes that function in synaptic transmission, learning, and memory. The subunits of these complexes have been identified by extensive proteomic approaches, and the importance of individual subunits for learning and memory have been validated by genetic studies in mice and by clinical studies in humans (Pocklington et al, 2006). We estimate that ~20–60% of the subunits of these neuronal-specific complexes are actually universally expressed housekeeping proteins (Figure 3A and B
Discussion The evolution of tissue-specific biological processes Taken together, our findings suggest the following model for the evolution of tissue-specific functions. Many (but not all) tissue-specific proteins are recent evolutionary innovations (Lehner et al, 2004). In general, these tissue-specific proteins initially make few interactions, and these interactions are frequently with much more widely expressed and ‘housekeeping' components of the cell. Thus, many tissue-specific proteins probably function by directly recruiting or modifying the activities of core cellular components. There are, however, exceptions to this trend, with some tissue-specific proteins acting as ‘local' hubs in the interaction network of a particular tissue (our unpublished observation). Frequent re-use of housekeeping proteins for tissue-specific biology Universally expressed ‘housekeeping' proteins tend to make many interactions. Many of these interactions (~50–60%, Supplementary Figure 3) are with other housekeeping proteins. However, the majority of universally expressed proteins also make interactions that can only occur in a subset of the tissues in which they are expressed. Therefore, there appears to be very frequent, and possibly universal, re-use of ‘housekeeping' proteins to perform tissue-specific biological processes. That is, most housekeeping proteins can be considered to be important for different (or at least modified) biological processes in different tissues. In summary, our results suggest that it might be better to consider the biology of any particular tissue in the terms of the particular interactions that can occur in that tissue, rather than simply in the terms of the unique proteins that are expressed there. The importance of interaction network dynamics In unicellular yeast, broadly expressed proteins can have precisely temporally regulated activities because of their interactions with proteins with restricted expression profiles (de Lichtenberg et al, 2005). We show here that a similar process may be widely used in multicellular organisms to restrict and modify the activities of a protein to a subset of the tissues in which it is expressed. Together with earlier analyses in yeast (Han et al, 2004; Luscombe et al, 2004; de Lichtenberg et al, 2005), this work highlights the importance of considering global interaction networks as having dynamic, not static, structures, and topologies. Additional work analyzing how the networks of molecular interactions change between cell types, states, and conditions should prove a fruitful approach for understanding living systems. Materials and methods Protein interaction data We compiled human protein interactions from a total of 21 different databases, as listed in Table I. We required that each interaction be supported by at least one piece of direct experimental evidence demonstrating physical association between two human proteins, and removed all interactions that did not meet these criteria. All interactions were mapped to common Ensembl gene identifiers. The complete network (‘CRG-all'), consists of 80 922 interactions between 10 229 human proteins (approximately half the human proteome) and is available as Supplementary Table S1.
Filtered interaction dataset In total, 13 102 of the interactions in our network between 4750 proteins are supported by experimental evidence of physical binding reported in at least two different primary research publications. Given the multiple lines of evidence supporting these interactions, we use this subset of interactions (‘CRG-filtered') as high-confidence interactions to confirm that our conclusions are not affected by interaction data quality or sampling (see Supplementary Figures). Expression data To identify which protein interactions can occur in a particular cell or tissue type, we used global gene expression data. Although interactions can be regulated by localization, phosphorylation, etc, we aim to distinguish the proteins that can interact under some condition in a tissue from those that cannot, and mRNA expression is a reasonable indicator of this potential. We used expression data from the GNF Atlas project that measured expression across 79 different human cell or tissue types (Su et al, 2004). The MAS5 normalized expression levels were averaged between experimental replicas, and in cases where more than one probe set was present for a gene, the more sensitive probe set was used. In this dataset, a gene is considered as present in a tissue, if its normalized expression level is >200 (Su et al, 2002). However, our conclusions remain the same when this stringency is increased or decreased (see Supplementary information). At this threshold, >98% of the interaction partners in our global network for which expression information is available are co-expressed in least one human tissue. Housekeeping proteins We identified universally expressed housekeeping proteins using a total of 10 different criteria. First, we used the GNF Atlas data, and considered housekeeping proteins as those with an expression level above 200 in all 79 tissues, or in more than 70/79 tissues (i.e. allowing for some false-negatives). Second, we used the same two tissue criteria, but increased (250) or decreased (150) the stringency at which a gene is considered expressed. Third, we used four additional sets defined in an earlier publication—genes identified as expressed in 18/18 or at least 16/18 tissues using microarray data, and genes with the same tissue criteria but defined using expressed sequence tag (EST) data (Zhu et al, 2008). Neurotransmitter receptor complexes Components of N-methyl-D-aspartate receptor and metabotropic receptor complexes were identified by extensive proteomic studies as described (Pocklington et al, 2006). We used the 215 subunits of these complexes that could be mapped to human Ensembl gene identifiers, of which 77 have demonstrated roles in learning and memory through genetic studies in mice or are implicated in psychiatric disorders in humans (Pocklington et al, 2006). We used the sets of housekeeping proteins described above to identify how many of these subunits represent universally expressed proteins. Protein evolution Proteins were classified as metazoan specific or pre-metazoan using the analysis of Freilich et al (2005). Conflict of interest The authors declare that they have no conflict of interest. Supplementary Materials 1 Supplementary Figures 1 - 3 Click here to view.(792K, pdf) CRG human interactome Click here to view.(1.8M, zip) Acknowledgments This work was funded by a European Research Council (ERC) Starting Researcher Grant, the Ministry of Science and Innovation (MICINN-Plan Nacional), the CRG-EMBL Systems Biology Program, ICREA and a Leonardo da Vinci fellowship to AB. We thank three anonymous referees for helpful suggestions. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||
Curr Opin Struct Biol. 2004 Jun; 14(3):292-9.
[Curr Opin Struct Biol. 2004]Hum Mol Genet. 2005 Oct 15; 14 Spec No. 2():R171-81.
[Hum Mol Genet. 2005]Nature. 2000 Feb 10; 403(6770):623-7.
[Nature. 2000]Science. 2000 Jan 7; 287(5450):116-22.
[Science. 2000]Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4569-74.
[Proc Natl Acad Sci U S A. 2001]Nature. 2002 Jan 10; 415(6868):180-3.
[Nature. 2002]Science. 2004 Jan 23; 303(5657):540-3.
[Science. 2004]Nat Genet. 2002 Aug; 31(4):370-7.
[Nat Genet. 2002]Mol Syst Biol. 2007; 3():110.
[Mol Syst Biol. 2007]Nature. 2004 Jul 1; 430(6995):88-93.
[Nature. 2004]Nature. 2004 Sep 16; 431(7006):308-12.
[Nature. 2004]Science. 2005 Feb 4; 307(5710):724-7.
[Science. 2005]Proc Natl Acad Sci U S A. 2004 Apr 20; 101(16):6062-7.
[Proc Natl Acad Sci U S A. 2004]Trends Genet. 2004 Oct; 20(10):468-72.
[Trends Genet. 2004]Mol Syst Biol. 2006; 2():2006.0023.
[Mol Syst Biol. 2006]Genomics. 2004 Jan; 83(1):153-67.
[Genomics. 2004]Science. 2005 Feb 4; 307(5710):724-7.
[Science. 2005]Nature. 2004 Jul 1; 430(6995):88-93.
[Nature. 2004]Nature. 2004 Sep 16; 431(7006):308-12.
[Nature. 2004]Science. 2005 Feb 4; 307(5710):724-7.
[Science. 2005]Proc Natl Acad Sci U S A. 2004 Apr 20; 101(16):6062-7.
[Proc Natl Acad Sci U S A. 2004]Proc Natl Acad Sci U S A. 2002 Apr 2; 99(7):4465-70.
[Proc Natl Acad Sci U S A. 2002]BMC Genomics. 2008 Apr 16; 9():172.
[BMC Genomics. 2008]Mol Syst Biol. 2006; 2():2006.0023.
[Mol Syst Biol. 2006]Genome Biol. 2005; 6(7):R56.
[Genome Biol. 2005]BMC Genomics. 2008 Apr 16; 9():172.
[BMC Genomics. 2008]Mol Syst Biol. 2006; 2():2006.0023.
[Mol Syst Biol. 2006]PLoS Comput Biol. 2007 Oct; 3(10):2032-42.
[PLoS Comput Biol. 2007]