![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||
Copyright © 2007, EMBO and Nature Publishing Group Large-scale mapping of human protein–protein interactions by mass spectrometry 1Protana (now Transition Therapeutics), Toronto, Ontario, Canada 2Infochromics, MaRS Discovery District, Toronto, Ontario, Canada 3Faculty of Medicine, The Ottawa Institute of Systems Biology, University of Ottawa, BMI, Ottawa, Ontario, Canada 4Information Engineering Center, Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Ontario, Canada aThe Ottawa Institute of Systems Biology, University of Ottawa, BMI, 451 Smyth Road, Ottawa, Ontario, Canada K1H 8M5. Tel.: +1 613 562 5800 ext 8674; Fax: +1 613 562 5655; E-mail: dfigeys/at/uottawa.ca *Present address: Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada †Present address: Department of Biology, York University, Toronto, Ontario, Canada ‡Present address: Hospital for Sick Children and McLaughlin Centre for Molecular Medicine, and Department of Medical Genetics and Microbiology, University of Toronto, Toronto, Ontario, Canada §Present address: Popper and Company LLC, Sarasota, FL, USA Present address: Structural Genomics Consortium, University of Toronto, Toronto, Ontario, Canada¶Present address: Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research (WEHI), Parkville, Victoria, Australia #Present address: Novartis Institutes for Biomedical Research, Cambridge, MA, USA Present address: Platform Computing, Markham, Ontario, Canada**Present address: Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada ††Present address: CombinatoRx Inc, Cambridge, MA, USA ‡‡Present address: Michael Smith Genome Sciences Centre, BC Cancer Agency Genome Sciences Centre, Vancouver, British Columbia, Canada §§Present address: Institute of Biomaterials and Biomedical Engineering, University of Toronto, Toronto, Ontario, Canada ![]() Present address: Campbell Family Institute for Breast Cancer Research, University Health Network, Toronto, Ontario, Canada¶¶Present address: Sigma-Aldrich Corporation, St Louis, MO, USA ##Present address: Scientific Insights Consulting Group Inc., Mississauga, Ontario, Canada ![]() Present address: Advanced Protein Technology Centre, Hospital for Sick Children, Toronto, Ontario, Canada∫Present address: Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada Present address: MDS Pharma Services, Mississauga, Ontario, Canada Present address: Division of Haematology/Oncology, Hospital for Sick Children, Toronto, Ontario, Canada Received September 22, 2006; Accepted January 26, 2007. This article has been cited by other articles in PMC.Abstract Mapping protein–protein interactions is an invaluable tool for understanding protein function. Here, we report the first large-scale study of protein–protein interactions in human cells using a mass spectrometry-based approach. The study maps protein interactions for 338 bait proteins that were selected based on known or suspected disease and functional associations. Large-scale immunoprecipitation of Flag-tagged versions of these proteins followed by LC-ESI-MS/MS analysis resulted in the identification of 24 540 potential protein interactions. False positives and redundant hits were filtered out using empirical criteria and a calculated interaction confidence score, producing a data set of 6463 interactions between 2235 distinct proteins. This data set was further cross-validated using previously published and predicted human protein interactions. In-depth mining of the data set shows that it represents a valuable source of novel protein–protein interactions with relevance to human diseases. In addition, via our preliminary analysis, we report many novel protein interactions and pathway associations. Keywords: human interactome, IP-HTMS, protein–protein interaction Introduction Biomolecular interactions play a critical role in the vast majority of cellular processes. Understanding the roles and consequences of protein interactions is fundamental for the development of systems biology as well as the development of novel therapeutics. Our current knowledge of biomolecular interactions in terms of cataloging interactions and understanding their biophysical properties is still very limited and is hindered by the limitations (primarily throughput and reproducibility) of existing technologies. Different techniques for mapping protein interactions, such as the yeast two-hybrid approach (Y2H) (Chien et al, 1991) and the LUMIER approach (Barrios-Rodiles et al, 2005), are available, and address the question of whether two proteins interact in a pairwise fashion. We have developed a high-throughput platform combining immunoprecipitation and high-throughput mass spectrometry (IP-HTMS) to rapidly identify potentially novel protein interactions for a bait protein of interest. We (Ho et al, 2002) and others (Gavin et al, 2002) previously used this approach to map protein–protein interactions in yeast, creating invaluable data sets for yeast biology and extrapolation into mammalian biology. We have since extended this approach to the high-throughput mapping of protein–protein interactions in humans and refined the computational processing with new methodology to assign a confidence score to each interaction. Mapping protein interactions in human cells has its own set of challenges owing to the number of potentially expressed genes, the number of different cell types and the numbers of internal and external factors that impact the cellular system. Although a complete mapping of the human interactome is still beyond current capabilities, more focused studies are possible. For example, application of IP-HTMS on a smaller scale was used to study the human TNF-alpha/NF-kappa B signal transduction pathway (Bouwmeester et al, 2004). On a more global scale, the Y2H system has recently been applied to study pairwise human interactions (Rual et al, 2005; Stelzl et al, 2005). Here, we report the first large-scale application of IP-HTMS to the mapping of protein–protein interactions in human cells using 338 human bait proteins of significant biomedical interest. The complete data set is provided as a table of bait–prey pairs with associated confidence values (Supplementary Table II) and in PSI-MI (Hermjakob et al, 2004) format from the Intact database (www.ebi.ac.uk/intact), accession EBI-1059370. Results and discussion Bait selection and analytical processing An initial set of 407 human bait proteins was selected based on known or implied disease associations and functional annotation. These proteins are implicated in a diverse set of biological processes and pathways. The most well-represented biological process categories among the set of baits are protein modification, cell cycle, transcription and signal transduction, reflecting the choice of bait proteins that are fundamental to essential cellular processes. Many of the baits also have known disease associations, the most well represented being breast cancer, colon cancer, diabetes and obesity, reflecting our objective to target important human diseases. Approximately 10% of the baits selected were hypothetical or poorly annotated proteins, chosen in some cases for their homology to proteins with disease or functional associations of interest. The data set reported here maps interactions for 338 of the initial set of bait proteins. A complete listing of the bait proteins and a representative biological process from the Gene Ontology (GO) (Ashburner et al, 2000), where available, is provided in Supplementary Table I. (See Supplementary Information for further details on bait selection and disease associations.) Analytical processing and mass spectrometry were carried out as described in Materials and methods. In total, 1034 individual immunoprecipitation experiments were resolved by SDS–PAGE, and proteins visualized by colloidal Coomassie stain. Processing of the corresponding gel lanes yielded 16 321 gel bands that were processed by mass spectrometry generating over 400 000 MS/MS spectra that matched a peptide sequence in the database. For over half of the baits, replicated immunoprecipitation experiments were performed. Figure 1
Prey identification, scoring and filtering As shown in Figure 1
Filtering out spurious and nonspecific proteins In order to minimize the number of false-positive interactions, we applied an empirical filtering process to remove spurious/contaminant proteins and nonspecifically interacting proteins (Figure 2E 2.5% of control experiments were removed from the data set as were proteins interacting with 5% of baits. This combined set of proteins includes many common contaminants of mass-spectrometry experiments (such as human keratins) as well as proteins observed to bind nonspecifically, and includes protein families such as tubulins, ribosomal proteins and heat-shock proteins.
Interaction confidence scores As many peptide and protein identification metrics (scores, expect values, number of peptides, peptide coverage, etc.) can be used to assess the overall confidence of a prey identification, we sought to combine several of these metrics and generate an overall measure of the confidence of each prey observation (Figure 2G 5 immunoprecipitation experiments) baits (see Figure 1We validated our interaction scoring metric in several ways, demonstrating its utility as a measure of interaction confidence. First, using our training data set of reproduced baits, we performed a 10-fold cross-validation of the model, and measured the ability of our model to estimate prey reproducibility (see Supplementary Information). We found good correlation (r=0.66) between the observed reproducibility and predicted reproducibility across our training set. Second, by analyzing the subset of known interactions in the data set (see subsequent section), we observed that the interaction confidence scores assigned to the set of known interactions were significantly higher than those scores assigned to previously unknown interactions; the set of known interactions has a mean interaction confidence score of 0.43, whereas the mean of the entire set of interaction confidence scores is 0.21, a statistically significant difference (Wilcoxon rank sum test; P 0.0001). Third, we analyzed the set of reciprocal interactions in the data set. In our study, no explicit effort was made to test bait–prey interactions reciprocally (i.e., to use the observed prey proteins as baits and see whether the original bait proteins are identified). A small number of interactions (21) were, however, observed reciprocally in the data set. The interaction confidence scores of these 21 reciprocally observed interactions (mean=0.43) were significantly higher (Wilcoxon rank sum test; P 0.0001) than the set of interactions for which a reciprocal interaction was not observed (mean=0.25) or indeed the whole data set (mean=0.21). These observations show that the interaction confidence score is a useful means of ranking the interactions for subsequent data mining. To facilitate more in-depth analysis of such a large data set, we focused our in-depth interpretation of the interactions primarily on interactions with score 0.3, corresponding to approximately one-third (2251 interactions) of the data set. This threshold was chosen because most interactions between subunits of well-characterized protein complexes represented in the data set (the proteasome and eukaryotic translation initiation factors—see below) have scores 0.3. In addition, for 85% of prey proteins with interaction score 0.3, two or more distinct peptide sequences were identified, consistent with emerging guidelines for mass spectrometry-based protein identification (Bradshaw et al, 2006).Computational assessment and validation Other types of genomic information, when combined with protein–protein interactions, can provide stronger evidence of functional relationships between genes. Several methods of utilizing these orthogonal genomic data to computationally assess high-throughput protein–protein interaction data have been proposed, such as comparison with gene expression, analysis of paralogous interactions and utilization of functional and sub cellular localization information (Deane et al, 2002; von Mering et al, 2002; Rual et al, 2005). In this section, we present a computational assessment of the IP-HTMS data set by integrating three classes of genomic information: other human protein–protein interaction data sources, GO annotations and gene expression microarray data. An important consideration when integrating other data types is how to count the protein–protein interactions (von Mering et al, 2002). Two paradigms for modeling protein–protein interaction data have been proposed: the ‘spoke' model, whereby each bait is assumed to interact with each of its observed prey proteins, and the ‘matrix' model, whereby the bait and all of the preys interact with each other (Bader and Hogue, 2002) We adopted the ‘spoke' model for all of our analyses (unless stated otherwise), as the ‘matrix' model has been shown to produce higher rates of false positives (Bader and Hogue, 2002). We recognize, however, the limitations of the ‘spoke' model, in particular that bait–prey interactions identified in immunoprecipitation experiments may not actually represent direct physical interactions between the bait and prey protein. Comparison to other protein–protein interaction data sources Previous reports have in general found relatively little overlap between protein–protein interaction data sets (Bader and Hogue, 2002). For example, a recent comparison of a comprehensive literature-curated catalog of yeast interactions to all available high-throughput yeast interactions showed only a 14% overlap (Reguly et al, 2006). As pointed out by the latter authors, however, it is important to distinguish between the absolute intersection of the two data sets (the number of interactions in common between the data sets being compared) and the intersection of ‘interaction space' covered by each data set. For the IP-HTMS platform, the interaction space is the space covered by the set of bait proteins. For example, in comparing the IP-HTMS data set to a Y2H data set, we identify the IP-HTMS space as those Y2H interactions for which one or more of the interactors correspond to an IP-HTMS bait. Performing the comparisons in this way allows for realistic estimates of how interactions are recapitulated across different studies and technology platforms. We compared the IP-HTMS data set to three other sources of human protein–protein interactions: a collation of known interactions (Ramani et al, 2005), a set of interactions predicted from lower eukaryotic interactome maps (Lehner and Fraser, 2004) and a high-throughput Y2H study (Rual et al, 2005). The overlap between these data sets and the IP-HTMS data set are summarized in Table II. The overlap between the IP-HTMS data set and these three other sources ranges from 6 to 11%, broadly in line with observations of the overlap between the human Y2H data set and literature-curated interactions (2–8%) (Rual et al, 2005). By randomly permuting the IP-HTMS bait–prey interactions and re-computing the overlaps, we confirmed that the overlaps are significantly greater than would be expected by chance (P<0.0001). Similar comparisons in yeast between IP-HTMS interactions (Ho et al, 2002) and literature-curated and tandem affinity purification (Gavin et al, 2002) and literature-curated interactions show 20 and 30% overlaps, respectively (Reguly et al, 2006), suggesting that a much greater proportion of the yeast interactome has been cataloged than that the human interactome.
The sets of interactions in common between the human IP-HTMS interactions and each of the other three data sets are themselves overlapping; of the total of 256 overlapping interactions between IP-HTMS and the other three data sets, 82 are found in two or more of the overlapping sets. We also note that interactions in common between the IP-HTMS and other sources of human protein–protein interactions have in general significantly higher confidence scores. The mean confidence scores for the interactions in common between IP-HTMS and the known set, IP-HTMS and the predicted set, and IP-HTMS and the Y2H set are 0.43, 0.43 and 0.42, respectively, higher than expected by chance (P 0.0001; Wilcoxon rank sum test) given the overall distribution of confidence scores.As already mentioned, it is probable that some of the bait–prey interactions identified in IP-HTMS experiments may not actually represent direct physical interactions between the bait and prey protein, but instead interactions between preys. To explore this further, we first extended our comparisons by considering the matrix of all possible interactions in the IP-HTMS data set (i.e., including all possible prey–prey interactions for each bait). Of the matrix of ~225K possible IP-HTMS interactions, 1678 are in common with the known set (statistically significantly greater than expected by chance, P<0.0001). Although the accuracy of considering the matrix of all interactions is expected to be lower than when only considering bait–prey interactions (Bader and Hogue, 2002), clearly many valid interactions remain to be discovered from this broader approach. Second, we compared our IP-HTMS interactions to the literature using the Pathway Studio software (Ariadne Genomics). This software enables rapid annotation of protein–protein interactions with literature mined from various sources. Using this approach, 145 protein–protein interactions in our IP-HTMS data set were annotated as present in the literature. In order to identify those IP-HTMS interactions that represent indirect interactions between bait and prey, we mined the literature in the following way. Bait–prey pairs from our IP-HTMS experiments that have literature validation in the Pathway Studio database were selected. The interaction network was then expanded by extracting all known interactors from the literature that are within two edges of the prey. We then overlapped the experimental interactions with the expanded network such that for each bait we considered all paths of length two where the (bait, prey) and the (bait, interactor of prey) pairs are both in IP-HTMS, and hence, the (prey, interactor of prey) pair can be inferred. We did the same for paths of length three, and we enumerated all the distinct length-one pairs from the literature that were part of the overlapping paths. This allows us to significantly expand the validation of our data set using the literature by including not just bait–prey but also prey–prey interactions. With our additional analysis, the total number of observed interactions that are reinforced by the literature increases to 375. This represents a 2.6-fold increase in validation corresponding to 6% of all of our interactions. This set of interactions is provided in Supplementary Table IV. We have utilized this approach in a detailed way to extend networks for individual bait proteins. An example of this is provided in Supplementary Figure III. Only four direct interactors of VHL from our data set matched with the literature. Using our novel approach, we extended the interaction surrounding VHL within two literature edges. This increased the number of proteins seen in the VHL IP-HTMS experiment that are linked to VHL through the literature to 13 (three-fold increase). The nine new associations are indirect but are linked through known interactors of VHL. Paralogous interactions Evolutionary relationships between genes both across and within species have been proposed as sources for discovery and confirmation of protein–protein interactions (Matthews et al, 2001). In yeast, interactions between pairs of proteins have been shown to be of higher confidence if interactions also occur between paralogs of the interactors (Deane et al, 2002). The latter authors developed the paralogous verification method, and showed that in yeast the method was able to predict 40% of true interactions with a 1% false-positive rate (Deane et al, 2002). We explored the utility of this method for assessment of the IP-HTMS data set by first collating a set of 1999 groups of human paralogs (representing 6023 human genes) from the inparanoid database (O'Brien et al, 2005). Cross-referencing to the IP-HTMS data set identified 834 interactions for which both bait and prey could be assigned one or more paralogs. Overall, 154 of these 834 interactions (18%) had one or more paralogous interactions. The set of 154 paralogous interactions are provided as Supplementary Information (Supplementary Table III). In many cases, these paralogous interactions are comprised of a single bait interacting with two or more related (paralogous) prey proteins. We also wished to test the rate at which paralogous baits identify the same or related prey proteins. The IP-HTMS data set provides an opportunity to do this, because for 16 of the IP-HTMS baits, one or more paralogs have also been used as baits. These 16 baits correspond to 157 interactions for which paralogs were assigned, and 57 of these interactions are paralogous (36%). One caveat to analyzing the IP-HTMS data in this way is that it is not possible to distinguish between independent interactions of paralogous baits with the same or related prey proteins and the scenario whereby paralogous baits interact with each other (e.g., heterodimers) and that complex then identifies the same set of preys regardless of which bait is used. The set of 16 paralogous baits includes three members of the 14-3-3 protein family, YWHAB, YWHAQ and YWHAZ. These proteins are known to form homo- and heterodimers in vivo (Jones et al, 1995) and together contribute 35 of the 57 interactions from paralogous baits. Nevertheless, this is a useful demonstration of the reproducibility of paralogous baits; the three 14-3-3 baits identify 117 prey proteins in total, 33 of which are identified by more than one of the baits. Finally, we note that interactions supported by a paralogous interaction have significantly higher interaction confidence scores; the set of 154 paralogous interactions have a mean score of 0.33, as compared to 0.21 across the whole data set (Wilcoxon rank sum test; P 0.0001). As pointed out by Deane et al (2002), the paralogous verification method is useful only where paralogs can be identified. This is only possible for a relatively small fraction (834 out of 6463 interactions) of the IP-HTMS data set. Nevertheless, we believe that this first preliminary analysis of paralogous interactions in the human interactome illustrates the potential for further in-depth studies as our ability to assign paralogs improves and our knowledge of the human interactome increases.Biological process and pathway enrichment To gain an overview of the classes of proteins identified as preys for each of the baits, we used the GO (slim subsets) to analyze biological process and cellular component category representation. In both cases, the distribution of prey proteins among the categories is similar to the distribution of categories among bait proteins; the most well-represented bait biological process protein categories—protein modification, protein biosynthesis, cell cycle, transcription and signal transduction, are also the most well-represented prey protein categories. We used the GO annotation to analyze the degree to which bait and prey interactors share the same or related GO categories. For high-throughput yeast data, the fractions of interactions for which both interactors have the same high-level biological process or cellular component categories have been estimated at 20 and 27%, respectively (Reguly et al, 2006). For our human IP-HTMS data, these fractions are 12 and 20%, respectively. To illustrate these associations in more detail, we generated bait–prey coincidence maps (Figure 3
This analysis revealed a significant tendency of baits to interact with prey proteins implicated in the same or similar biological process (Figure 3A and B Integrated analysis of the IP-HTMS and GO categories also facilitated discovery of some very specific but potentially biomedically important interactions. Relatively few proteins in the IP-HTMS data set are assigned to the peroxisome (17 interactions involve a peroxosomal bait or prey). Of these interactions, a single interaction was observed between a peroxisomal bait and a peroxisomal prey: PHYH (phytanoyl-CoA 2-hydroxylase) bait identified ABCD3 (ATP-binding cassette, subfamily D) as a prey. Defects in the functioning of both PHYH and ABCD3 are implicated in Zellweger's syndrome and other peroxisomal biogenesis disorders, a set of potentially severe (fatal) inherited diseases (Moser, 1999; Steinberg et al, 2006). In addition, several studies have shown interactions between ABCD proteins and peroxisomal biogenesis factors (PEX proteins) and between PHYH and PEX proteins (Liu et al, 1999; Gloeckner et al, 2000). To our knowledge, our observation is the first indication of a protein–protein interaction between PHYH and ABDC3. Cross-referencing gene expression information Increased similarity of gene expression profiles for genes encoding interacting proteins has been demonstrated in yeast (Ge et al, 2001). Preliminary evidence that this may also be the case in higher eukaryotes has been reported for Caenorhabditis elegans (Li et al, 2004) and in humans (Hahn et al, 2005; Rual et al, 2005). In the latter case, enrichment for higher gene expression correlation was seen for both literature-derived interactions and, albeit at a lower level, for the experimentally derived data set (Rual et al, 2005). One of the principal issues in attempting to measure whether a relationship exists between gene expression and protein interaction data sets is the incompleteness and arbitrary nature of selecting appropriate human gene expression data. Rather than select individual data sets over which co expression could be measured, we made use of a compendium of co-expression measurements generated from 3924 microarrays from 60 different human studies (Lee et al, 2004). Co-expression links in this study are defined as positive or negative based upon their position within the extremes of the distributions of correlation for each study (Lee et al, 2004). Figure 4
We have also used the integrated IP-HTMS and gene co-expression data for further in-depth discovery of functional relationships between genes. The LYAR (Ly-1 antibody reactive) protein was originally isolated from a mouse T-cell leukemia cell line and shown to encode a predominantly nucleolar-localized protein (Su et al, 1993). As an IP-HTMS bait, LYAR identified 79 prey proteins, and of these, 32 were also found as coexpressed genes in the co-expression database (Lee et al, 2004). Twelve of these co-expression links are classed as stringent (co-expression observed across three or more gene expression studies) (Lee et al, 2004), and are represented in Figure 5
Biological interpretation of the interaction network Global visualization of the IP-HTMS data set To aid interpretation of the IP-HTMS data set, we visualized the interaction network in two ways. First, to globally visualize the data set, we developed the bait–bait connectivity map (Figure 6A and B
Several features of the data set are clear from the bait–bait map in Figure 6A
NIMA family kinases and the mitotic cascade The NIMA (never in mitosis gene a) was originally described in Aspergillus nidulans as a key regulator of entry into the mitotic cycle. Hence, families of NIMA-related kinases (Nek) have since been found to be widely distributed in eukaryotes with a conserved role in regulation of mitosis (Lu and Hunter, 1995; O'Connell et al, 2003). In humans, 11 members of the Nek family have been described. Nek6 was previously shown to be essential for mitotic progression in human cells, and was suggested to be particularly important for the metaphase–anaphase transition (Yin et al, 2003) and chromatin condensation (Hashimoto et al, 2002). Expression analysis also suggested an association of Nek family members with chromosome instability and cancer (Bowers and Boylan, 2004; Hayward and Fry, 2005). Nek6 bait was used in three IP-HTMS experiments, and 42 prey proteins were identified (see the interaction map in Figure 6E Translation initiation and elongation factors The molecular mechanisms underlying protein synthesis in eukaryotic organisms are complex and only partially understood (Kapp and Lorsch, 2004). The eukaryotic translation process can be divided into four steps: initiation—the assembly of the ribosome at the initiation codon, elongation—the positioning of aminoacyl tRNAs into the acceptor site, termination—occurring when a stop codon is encountered, and finally the recycling of the ribosomal machinery. As part of our protein interaction mapping, we selected six eukaryotic translation initiation factor (EIF) proteins as baits (EIF2B1, EIF3S10, EIF4A1, EIF4A2, EIF4EBP1 and GC20). A total of 222 interactions were identified for these six baits, primarily with GC20 (162 interactions) and EIF4A2 (42 interactions). Seventy-five interactions have an interaction confidence score greater than 0.3, and 60% of these interactions are with other eukaryotic initiation factor proteins or components of the translational machinery. We focus our discussion here on this subset of the interactions. Our results recapitulate many of the known complexes and steps involved in translation initiation and demonstrate both the specificity and sensitivity of the IP-HTMS approach. Figure 6F The first step of translation initiation is the formation of a ternary complex between GTP, Met-tRNA and EIF2 and binding of this complex and other EIFs to the 40S ribosomal complex to form the 43S preinitiation complex (Pestova et al, 2001). We observed several complexes that participate in this process. GC20 is a homolog of the yeast SUI1/EIF1 protein, known to be required for binding of the GTP/Met-tRNA/EIF2 complex to the 40S ribosome (Majumdar et al, 2003). In our experiments, the GC20 bait identified several components of the EIF2 complex. EIF3 is also required for generation of a stable 40S pre-initiation complex. Our experiments with GC20 and EIF3S10 identified many of the EIF3 components (EIF1 (GC20 homolog) has previously been shown to interact with EIF3 (Fletcher et al, 1999)). The EIF3S10 experiments demonstrate the specificity and sensitivity of the IP-HTMS approach; this bait identified eight prey proteins, seven of which are documented EIF3 subunits. Interestingly, the remaining prey protein, GA17, dendritic cell protein, contains a Proteasome/COP9/Initiation factor (PCI) domain, a domain of unknown function but which is seen in components of multi-subunit complexes, such as the proteasome, COP9 and EIF3. Our results support recent work suggesting that GA17 is an additional subunit of EIF3 (Unbehaun et al, 2004). The next step in the process is mRNA binding to form the 43S pre-initiation complex. EIF4H is known to interact with EIF4A as part of this process and was observed in our experiments (Richter et al, 1999). Both EIF4A and EIF4H were observed in the raw data for the GC20 immunoprecipitation experiments, although EIF4A was removed based on our filtering criteria and EIF4H assigned a low interaction confidence score. Eukaryotic messenger RNAs contain a modified guanosine, termed a cap, at their 5′ ends. For translation to proceed, binding of an initiation factor, EIF4E, to the cap structure is required (Richter and Sonenberg, 2005). EIF4B binds near the 5′-terminal cap of mRNA in the presence of EIF4F and ATP. EIF 4G1, 4G2, 4E and 4A are known components of the EIF4F multi-subunit complex, all of which were observed in our experiments with the EIF4 baits. EIF4E and protein translation as a whole are regulated in part by the EIF4E binding protein, EIF4EBP1 (Haghighat et al, 1995). In our experiments, the EIF4EBP1 bait identified a single prey protein, EIF4E. The PDCD4 (programmed cell death 4) protein was identified as a prey in both EIF4A1 and EIF4A2 experiments. The PDCD4 gene product has been reported to be a tumor and transformation suppressor and proposed as a target for cancer therapy (Lankat-Buttgereit and Goke, 2003). PDCD4 has also been shown to inhibit translation through its binding to EIF4A and EIF4G (Yang et al, 2004; Zakowicz et al, 2005). Our results support these reports and suggest that PDCD4 interacts very specifically with the translation machinery; PDCD4 was seen only with the EIF4A1 and EIF4A2 baits. Finally, EIF2B functions to recycle the EIF2–GDP complex and recreate EIF2–GTP, which is then ready for a subsequent round of initiation. Immunoprecipitation using EIF2B1 identified six prey proteins, two of which (EIF2B3 and EIF2B5) are documented EIF2B components. IP-HTMS has provided us with a snapshot of the interactions occurring during the complex process of eukaryotic translation initiation. With six bait proteins covering the major processes of initiation, we are able to identify many relevant interacting proteins and provide a rich data set for further discovery. Future prospects This study presents the first high-throughput analysis of native protein complexes by IP-HTMS in a human cell line. As illustrated in this report, our data set provides for both recapitulation of known complexes and discovery of new interactions and complexes. Although our data set maps interactions for proteins implicated in a broad range of pathways and processes, we anticipate that future, focused applications of the IP-HTMS approach will begin to probe in greater depth the impact of disease states and drug treatments on human protein–protein interactions. Materials and methods Cloning of the bait cDNAs and construction of entry clones Full-length cDNAs encoding the genes for the respective protein baits were either purchased from Invitrogen (www.invitrogen.com) and the Kazusa project (www.kazusa.or.jp) or cloned in-house. Established polymerase chain reaction (PCR) methodologies were used to amplify the bait cDNAs from the corresponding parent plasmid DNAs. The oligonucleotide primers used for PCR (four required for each unique bait gene; two 5′-terminal primers, with Kozak code or not, and two 3′-terminal primers, with or without a stop codon) were designed to be complementary to the 5′ and 3′ ends of the bait coding region and to introduce an additional nucleotide sequence (29 bp), corresponding to Gateway attB recombination sites (Invitrogen), onto the ends of the PCR product. To create Gateway entry vectors, a portion of the purified PCR reaction product was added to the BP Reaction mixture, which contains a donor vector (encoding attP sites) and the BP CLONASE mix of recombination proteins. The recombination results in the oriented integration of the attB flanked PCR product into the attP sites of the donor vector, generating the Entry Clone in which the bait gene coding region is now flanked by attL sites (required for the LR Reaction, see below). A portion of the BP Reaction was used to transform competent Escherichia coli DH5α cells and the Entry Clone plasmid DNA was purified from selected transformants (antibiotic selection) using routine plasmid miniprep protocols (Sigma-Aldrich, www.sigmaaldrich.com). The integrity of each Entry Clone was verified by PCR amplification using gene-specific primers and DNA sequencing. Construction of destination vectors Two Destination Vectors, DV1 and DV2, were constructed based on a vector backbone using standard recombinant DNA methodologies. The Entry Clone and Destination Vector were subjected to the GATEWAY LR Reaction, which contains the LR CLONASE mix of recombination proteins. The LR Reaction results in the directional transfer of the bait gene coding region, flanked by the attL sites in the Entry Clone, to the Destination Vector (DV1 or DV2) through recombination with the attR flanked GATERC, generating the Expression Clone. A portion of the LR Reaction was used to transform competent DH5α cells and the Expression Clone plasmids were purified from selected transformants (antibiotic selection) using routine plasmid miniprep protocols. Following confirmation by PCR with gene-specific primers, milligram quantities of purified Expression Clones were prepared by standard protocols (Maxiprep; Sigma-Aldrich). Cell culture Anchorage-dependent human embryonic kidney 293 (HEK293) cells were maintained in Dulbecco's modified Eagle's medium (DMEM) containing 10% fetal bovine serum and supplemented with 2 mM L-glutamine and 0.1 mM nonessential amino acids. Cells were grown in 10-cm-diameter or 24.5 × 24.5 cm2 tissue culture plates at 37°C in a 5% CO2 atmosphere. Cells were routinely tested for mycoplasma presence. A detailed protocol for the maintenance and passaging of cells is provided in Supplementary Information. Transient transfection A seed culture of HEK293 cells (at 70–80% confluence) was split and plated with fresh media the day before transfection and then grown to 30–40% confluency. Before performing transfection, cell plates were individually verified by microscopy. In particular, we verified that cells were healthy—no large vacuoles, no long extensions, not rounded up, no contamination was present (mould, yeast or bacteria) and less than 5% dead cells. We also confirmed that the plates were approximately 40% confluent. Any plates that did not meet the above criteria were discarded. Typically, approximately 1 × 107 cells were transiently transfected by adding 5 μg of DNA construct in the form of a calcium phosphate/DNA coprecipitation protocol. Briefly, a solution of calcium chloride and maxiprep Expression Clone plasmid DNA was diluted with an inorganic phosphate-containing buffer. The mixture was overlaid on the cells following a brief period to allow the calcium phosphate/DNA precipitate to develop. Cells were incubated at 37°C with the calcium phosphate DNA mixture for 12–16 h, the culture medium was replenished and the cells was cultured a further 24 h to ~90% confluence before harvest. A similar procedure was used to culture HEK293 cells that were transiently transfected with the Destination Vector (no bait gene) in order to provide a negative-control sample. A detailed protocol for the transfection is provided in Supplementary Information. Cell harvest and extract preparation All methods used during the harvest procedure were performed at 4°C. Following the culture period described above (for each experimental and control culture), the media were removed from the plates by aspiration and the adherent HEK293 cells were washed thoroughly with Tris-buffered saline. Cells were then overlaid with a predetermined volume of detergent-containing lysis buffer (supplemented with a cocktail of protease inhibitors) and then scraped to concurrently dislodge and lyse the cells. Typically, cells were lysed by the addition of (1 ml) of lysis buffer (20 mM Tris–HCl (pH 7.5), 150 mM NaCl, 1 mM EDTA, 1% NP-40, 0.5% sodium deoxycholate, 10 μg/ml aprotinin, 0.2 mM AEBSF (Calbiochem)). The cell lysate was collected and then clarified by preparative centrifugation for 30 min at 20 000 g to yield a crude extract. In all cases, portions of the soluble and insoluble fractions from the centrifugation were separated by SDS–PAGE and immunoblotted with an anti-FLAG® (M2) monoclonal antibody (see below) to verify the bait's presence in the soluble extract fraction. Immunoprecipitation of bait and bait-specific interacting proteins The Flag-tagged bait proteins and their interacting partners were isolated from cell extracts by immunoprecipitation using M2-Agarose resin (Sigma-Aldrich). The M2-Agarose comprises the monoclonal anti-Flag M2 antibody immobilized onto an agarose resin and reacts specifically with fusion proteins possessing the Flag epitope at the N- or C-terminus. Briefly, the crude lysate were first incubated with 5 μg of agarose beads for 60 min at 4°C to remove nonspecific binders. The supernatant was then subjected to immunoprecipitation by adding 5 μg of anti-Flag monoclonal antibody covalently attached to crosslinked agarose beads (M2, Sigma). The mixture was gently agitated by inversion for 60 min at 4°C. Immunocomplexes associated with the insoluble fraction were recovered by centrifugation (1000 g for 2 min) and washed by three cycles of resuspension in lysis buffer followed by centrifugation as described above. Immunocomplexes were eluted from the beads by resuspension in 250 μl of 50 mM ammonium bicarbonate (prepared just before to use) containing 400 μM Flag peptide. Following a 30 min incubation, beads were removed by centrifugation and the supernatant containing Flag peptide as well as the eluted proteins was lyophilized. Gel-based protein analysis The dried immunopurified proteins were solubilized in a minimal volume of protein-loading buffer and subjected to SDS–PAGE. The immunopurified proteins were then separated by gel electrophoresis and detected by colloidal Coomassie staining. All gels were subjected to a visual appraisal before further processing; gel lanes that contained anomalies such as significant background across the entire lane or a large number of protein bands arising from nonspecific protein precipitation were rejected (approximately 40% of the gels were rejected based on these criteria). Band excision was automatically performed by a robotic system developed in-house and gel bands automatically transferred to a 96-well plate. Post-excision steps were carried out using commercially available automated robotic workstations (ProGest, Genomic Solutions). The proteins contained in the excised gel bands were treated with dithiothreitol (DTT) and the free sulfhydryl groups were alkylated using iodoacetamide. Proteins were then digested with trypsin and the resulting peptides were extracted from the gel slice using a series of wash steps. The extracted peptides were concentrated and analyzed directly by mass spectrometry. Mass spectrometry LC-ESI-MS/MS identification of proteins was performed as described previously (Figeys et al, 2001) using an automated network of mass spectrometers. Tryptic peptides recovered from individual gel bands were separated by reverse-phase chromatography on C18 resin and directly injected into a mass spectrometer. Ion trap mass spectrometers (LCQ Deca, Thermo Finnigan), operated in a data-dependent mode, which produces tandem MS spectra of all peptide species present above a programmed threshold, were used for these experiments. Note: Additional detailed experimental protocols for cell transfection and passaging of cells are provided in Supplementary Information. Data analysis Data management Laboratory data were managed using an in-house developed LIMS system that tracks all steps of immunoprecipitation, gel band excision as well as mass spectrometry acquisition names, annotated SDS–PAGE images and QC data. Mass spectrometry acquisition files were stored on a centralized network file system and processed using an automated analysis pipeline, including a cluster of Mascot nodes for peptide and protein identification. Peptide and protein identification All spectra were analyzed using Mascot version 1.9 (Matrix Sciences, www.matrixscience.com) searches against a non-redundant human protein sequence database (122 989 entries), constructed from all major sources of human protein sequences (GenBank, TrEMBL, SwissProt, IPI and Ensembl). Mascot was run in MS/MS Ion search mode with the following parameter settings: fixed modification (carbamidomethyl on cysteine), variable modification (oxidation on methionine), peptide mass tolerance 2 Da, fragment mass tolerance 0.4 Da, maximum missed cleavages two and enzyme trypsin. Peptide and protein identifications were included for further analysis according to the following criteria: for single peptide hit proteins, Mascot ionscore 40; for proteins with multiple peptide hits, each Mascot peptide ionscore 20. (The average Mascot recommended (P<0.05) ionscore for our data is ~40.) Further assessment of the peptide and protein identification false-positive rates was made by searching a subset (500 gel bands; ~3% of the data) against a randomized (each entry randomly shuffled) sequence database. Using Mascot ionscore thresholds as above, we estimate a protein false-positive rate of <7.5%. Mascot result files were parsed, proteins clustered and all data stored in a relational database. An in-house protein sequence index and annotation system was used to both provide the non-redundant sequence search database and to interpret and analyze the resulting protein hits. Spotfire (www.spotfire.com), cytoscape (www.cytoscape.org) softwares and Pathway Studio (Ariadne Genomics) were used extensively for data analysis and interaction map visualization respectively. The PLS regression analysis and generation of interaction confidence score was implemented in custom code using Python (www.python.org).Comparisons to other data sets Comparisons were made in general by cross-referencing NCBI Gene Ids where possible, or official HUGO gene symbols. For comparison to other protein interaction data sets, computation of statistical significance was carried out by repeatedly randomizing (1000 iterations) the IP-HTMS bait–prey associations and recalculating the interactions in common between the set of randomized interactions and the data set being compared. Minimum, mean and maximum counts of the interactions in common were then calculated from the 1000 trials. Cross-referencing to the inparanoid database (O'Brien et al, 2005) was performed by downloading all orthologous pairs for Homo sapiens and then forming paralogous groups of human genes in a simple, single-link fashion. Integration of the gene co-expression compendium (Lee et al, 2004) was performed by cross-referencing gene symbols. GO analysis GO-Slim versions of the Gene Ontology (www.geneontology.org/GO.slims.shtml) were used to map baits and preys to biological processes and cellular component categories (courtesy of Suparna Mundodi and Amelia Ireland, and MGI, www.spatial.maine.edu/~mdolan/MGI_GO_Slim.html), respectively. In addition, certain baits were ‘up-propagated' to parent categories where representation was low. Eighty percent of proteins in the interaction network were assigned biological process categories and 77% cellular component categories (55% of interactions were assigned biological process categories for both bait and prey, 33% of interactions were assigned cellular component categories for both bait and prey). Each combination of bait GO category and prey GO category was tested for association by constructing a 2 × 2 contingency table and using the Fisher exact test. Distributions of P-values from randomly permuted bait–prey categories were characterized as follows. Random permutation of bait–prey category associations (1000 trials) were performed, contingency tables for each bait–prey category combination constructed and the Fisher exact test P-value calculated. These distributions of 1000 P-values for each bait–prey category combination were then used to calculate the frequency with which a P-value less than or equal to the observed non-random P-value is seen by chance. Supplementary Information Click here to view.(157K, pdf) Supplementary Figure I Click here to view.(24K, pdf) Supplementary Figure II Click here to view.(137K, pdf) Supplementary Figure III Click here to view.(22K, pdf) Supplementary Table I Click here to view.(56K, xls) Supplementary Table II Click here to view.(454K, xls) Supplementary Table III Click here to view.(17K, xls) Supplementary Table IV Click here to view.(35K, xls) Acknowledgments DF acknowledges funding from the Canada Research Chair program, the Natural Sciences and Engineering Research Council of Canada, the Canadian Institutes of Health Research, la Fondation Jean-Louis Lévesque and MDS Inc. MM acknowledges funding from the Canada Research Chair Program. We also acknowledge past and present colleagues at MDS Proteomics/Protana who have contributed to this project. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||
Proc Natl Acad Sci U S A. 1991 Nov 1; 88(21):9578-82.
[Proc Natl Acad Sci U S A. 1991]Science. 2005 Mar 11; 307(5715):1621-5.
[Science. 2005]Nature. 2002 Jan 10; 415(6868):180-3.
[Nature. 2002]Nature. 2002 Jan 10; 415(6868):141-7.
[Nature. 2002]Nat Cell Biol. 2004 Feb; 6(2):97-105.
[Nat Cell Biol. 2004]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]Mol Cell Proteomics. 2006 May; 5(5):787-8.
[Mol Cell Proteomics. 2006]Mol Cell Proteomics. 2002 May; 1(5):349-56.
[Mol Cell Proteomics. 2002]Nature. 2002 May 23; 417(6887):399-403.
[Nature. 2002]Nature. 2005 Oct 20; 437(7062):1173-8.
[Nature. 2005]Nature. 2002 May 23; 417(6887):399-403.
[Nature. 2002]Nat Biotechnol. 2002 Oct; 20(10):991-7.
[Nat Biotechnol. 2002]Nat Biotechnol. 2002 Oct; 20(10):991-7.
[Nat Biotechnol. 2002]J Biol. 2006; 5(4):11.
[J Biol. 2006]Genome Biol. 2005; 6(5):R40.
[Genome Biol. 2005]Genome Biol. 2004; 5(9):R63.
[Genome Biol. 2004]Nature. 2005 Oct 20; 437(7062):1173-8.
[Nature. 2005]Nature. 2002 Jan 10; 415(6868):180-3.
[Nature. 2002]Nature. 2002 Jan 10; 415(6868):141-7.
[Nature. 2002]Nat Biotechnol. 2002 Oct; 20(10):991-7.
[Nat Biotechnol. 2002]Genome Res. 2001 Dec; 11(12):2120-6.
[Genome Res. 2001]Mol Cell Proteomics. 2002 May; 1(5):349-56.
[Mol Cell Proteomics. 2002]Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D476-80.
[Nucleic Acids Res. 2005]FEBS Lett. 1995 Jul 10; 368(1):55-8.
[FEBS Lett. 1995]Mol Cell Proteomics. 2002 May; 1(5):349-56.
[Mol Cell Proteomics. 2002]J Biol. 2006; 5(4):11.
[J Biol. 2006]Mol Genet Metab. 1999 Oct; 68(2):316-27.
[Mol Genet Metab. 1999]Biochim Biophys Acta. 2006 Dec; 1763(12):1733-48.
[Biochim Biophys Acta. 2006]J Biol Chem. 1999 Nov 12; 274(46):32738-43.
[J Biol Chem. 1999]Biochem Biophys Res Commun. 2000 Apr 29; 271(1):144-50.
[Biochem Biophys Res Commun. 2000]Nat Genet. 2001 Dec; 29(4):482-6.
[Nat Genet. 2001]Science. 2004 Jan 23; 303(5657):540-3.
[Science. 2004]BMC Bioinformatics. 2005 May 6; 6():112.
[BMC Bioinformatics. 2005]Nature. 2005 Oct 20; 437(7062):1173-8.
[Nature. 2005]Genome Res. 2004 Jun; 14(6):1085-94.
[Genome Res. 2004]Genes Dev. 1993 May; 7(5):735-48.
[Genes Dev. 1993]Genome Res. 2004 Jun; 14(6):1085-94.
[Genome Res. 2004]FEBS Lett. 1995 Jul 10; 368(1):55-8.
[FEBS Lett. 1995]Curr Biol. 2004 Aug 24; 14(16):1436-50.
[Curr Biol. 2004]Cell. 1995 May 5; 81(3):413-24.
[Cell. 1995]Trends Cell Biol. 2003 May; 13(5):221-8.
[Trends Cell Biol. 2003]J Biol Chem. 2003 Dec 26; 278(52):52454-60.
[J Biol Chem. 2003]Biochem Biophys Res Commun. 2002 May 3; 293(2):753-8.
[Biochem Biophys Res Commun. 2002]Gene. 2004 Mar 17; 328():135-42.
[Gene. 2004]Annu Rev Biochem. 2004; 73():657-704.
[Annu Rev Biochem. 2004]Proc Natl Acad Sci U S A. 2001 Jun 19; 98(13):7029-36.
[Proc Natl Acad Sci U S A. 2001]J Biol Chem. 2003 Feb 21; 278(8):6580-7.
[J Biol Chem. 2003]EMBO J. 1999 May 4; 18(9):2631-7.
[EMBO J. 1999]Genes Dev. 2004 Dec 15; 18(24):3078-93.
[Genes Dev. 2004]J Biol Chem. 1999 Dec 10; 274(50):35415-24.
[J Biol Chem. 1999]Nature. 2005 Feb 3; 433(7025):477-80.
[Nature. 2005]EMBO J. 1995 Nov 15; 14(22):5701-9.
[EMBO J. 1995]Biol Cell. 2003 Nov; 95(8):515-9.
[Biol Cell. 2003]Mol Cell Biol. 2004 May; 24(9):3894-906.
[Mol Cell Biol. 2004]RNA. 2005 Mar; 11(3):261-74.
[RNA. 2005]Methods. 2001 Jul; 24(3):230-9.
[Methods. 2001]Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D476-80.
[Nucleic Acids Res. 2005]Genome Res. 2004 Jun; 14(6):1085-94.
[Genome Res. 2004]Genome Res. 2004 Jun; 14(6):1085-94.
[Genome Res. 2004]Nature. 2005 Oct 20; 437(7062):1173-8.
[Nature. 2005]Genome Biol. 2005; 6(5):R40.
[Genome Biol. 2005]Genome Biol. 2005; 6(5):R40.
[Genome Biol. 2005]Genome Biol. 2004; 5(9):R63.
[Genome Biol. 2004]Nature. 2005 Oct 20; 437(7062):1173-8.
[Nature. 2005]