![]() | ![]() |
Formats:
|
||||||||||||||||||||||
Copyright © 2007 Sugaya et al; licensee BioMed Central Ltd. An integrative in silico approach for discovering candidates for drug-targetable protein-protein interactions in interactome data 1PharmaDesign, Inc., 2-19-8 Hatchobori, Chuo-ku, Tokyo, 104-0032, Japan 2Central Research Laboratory, Hitachi, Ltd., 1-280 Higashi-koigakubo, Kokubunji-shi, Tokyo, 185-8601, Japan 3Genomic Sciences Center, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan 4Graduate School of Genetic Resources Technology, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, Fukuoka, 812-8581, Japan 5Research & Development Group, Hitachi, Ltd., 1-6-1 Marunouchi, Chiyoda-ku, Tokyo, 100-8220, Japan Corresponding author.Nobuyoshi Sugaya: sugaya/at/pharmadesign.co.jp; Kazuyoshi Ikeda: ikeda/at/pharmadesign.co.jp; Toshiyuki Tashiro: tashiro/at/pharmadesign.co.jp; Shizu Takeda: shizu.takeda.me/at/hitachi.com; Jun Otomo: jun.otomo.aq/at/hitachi.com; Yoshiko Ishida: yoshiko.ishida.aq/at/hitachi.com; Akiko Shiratori: akiko.shiratori.ud/at/hitachi.com; Atsushi Toyoda: toyoda/at/gsc.riken.jp; Hideki Noguchi: hide/at/cb.k.u-tokyo.ac.jp; Tadayuki Takeda: ttakeda/at/gsc.riken.jp; Satoru Kuhara: kuhara/at/grt.kyusyu-u.ac.jp; Yoshiyuki Sakaki: sakaki/at/gsc.riken.jp; Takao Iwayanagi: takao.iwayanagi.ap/at/hitachi.com Received February 6, 2007; Accepted August 20, 2007. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Background Protein-protein interactions (PPIs) are challenging but attractive targets for small chemical drugs. Whole PPIs, called the 'interactome', have been emerged in several organisms, including human, based on the recent development of high-throughput screening (HTS) technologies. Individual PPIs have been targeted by small drug-like chemicals (SDCs), however, interactome data have not been fully utilized for exploring drug targets due to the lack of comprehensive methodology for utilizing these data. Here we propose an integrative in silico approach for discovering candidates for drug-targetable PPIs in interactome data. Results Our novel in silico screening system comprises three independent assessment procedures: i) detection of protein domains responsible for PPIs, ii) finding SDC-binding pockets on protein surfaces, and iii) evaluating similarities in the assignment of Gene Ontology (GO) terms between specific partner proteins. We discovered six candidates for drug-targetable PPIs by applying our in silico approach to original human PPI data composed of 770 binary interactions produced by our HTS yeast two-hybrid (HTS-Y2H) assays. Among them, we further examined two candidates, RXRA/NRIP1 and CDK2/CDKN1A, with respect to their biological roles, PPI network around each candidate, and tertiary structures of the interacting domains. Conclusion An integrative in silico approach for discovering candidates for drug-targetable PPIs was applied to original human PPIs data. The system excludes false positive interactions and selects reliable PPIs as drug targets. Its effectiveness was demonstrated by the discovery of the six promising candidate target PPIs. Inhibition or stabilization of the two interactions may have potential therapeutic effects against human diseases. Background Most proteins exhibit their biological function via interactions with partner proteins, and thus, PPIs play fundamental and key roles in various cellular processes in organisms. PPIs have recently been recognized as challenging but attractive targets for small chemical drugs [1]. In particular, the inhibition of PPIs by SDCs has been intensively studied [1-5]. Investigations to date suggest that PPI inhibition by SDCs could lead treatments for some human diseases [1-5]. One of the well-investigated target PPIs is the interaction between tumor suppressor protein p53 and murine double-minute-2 protein (MDM2) [6-8]. It has been shown that a family of SDCs, the nutlins, inhibit this interaction [6,7], suggesting that the nutlins could be potential therapeutic drugs for cancer [8]. Several promising PPIs have been targeted by SDCs, such as AMAP1/cortactin for preventing breast cancer invasion and metastasis [9], B7.1/CD28 for modulating T-cell activation [10], BAK/BCL2 or BAK/BCL-XL for inducing apoptosis in tumor cells [11-14], β-catenin/Tcf4 for cancer treatment [15,16], IL2/IL2Rα for suppressing autoimmune diseases [17,18], LFA1/ICAM1 for modulating lymphocyte and immune system function [19-21], and NGF/p75NTR for blocking neuropathic and inflammatory pain [22]. Although the PPIs targeted in the previous studies [6-22] were arbitrarily chosen according to the researchers' own interest in each individual PPI and by their interest in diseases related to the PPI, there have been few studies aimed at discovering or selecting target PPIs at the level of whole PPIs, called the 'interactome'. One reason for this has been the lack of strategies for comprehensively exploring and discovering target PPIs in the interactome. The enormous amounts of PPI data produced by HTS technologies in recent years [23-35] provide a promising opportunity for addressing this matter. Here we propose a novel and integrative in silico approach for discovering candidates for drug-targetable PPIs by computationally screening large amounts of PPI data. To begin with, this approach is applied to the previously-investigated target PPIs, then the effectiveness and potential of the approach is demonstrated by applying the methodology to original human PPI data produced by our HTS-Y2H assays. Results Synopsis of our in silico system Many previously-investigated target PPIs satisfy several criteria sufficient to be chosen as drug targets. One criterion is that interacting domains involved in a PPI have been already identified. Domain-domain interactions responsible for PPIs are more informative for researchers than PPIs to select potential drug targets [36]. This is because two domains that exclusively interact with each other can be specifically inhibited by a SDC without other PPIs being inhibited. In contrast, if a domain targeted by a SDC is shared with a large number of interacting proteins, and if this domain interacts with other domains, it is likely that the SDC will cause an off-target effect by inhibiting non-targeted PPIs that are essential to the organism. A second criterion is the presence of SDC-binding pockets on the surface of the interacting protein. In many cases of the previously-investigated target PPIs, SDCs interact with a pocket in which the small number of amino acid residues exist that contribute the large fraction of protein-protein binding free energy, so-called 'hot spots' [1,37]. In order to inhibit a PPI by SDCs, one or both of the two interacting proteins should have a pocket on protein surface to which SDCs can bind. This criterion holds whether the SDCs exhibit their inhibiting effects via direct binding to the PPI interface, or via allosteric effects caused by SDC-induced conformational change to the tertiary structure of the SDC-interacting protein. A third criterion is that the biological roles of the PPI are well understood. This is necessary in order to infer the phenotypic effects caused by inhibition of the PPI in the cell. In addition, if the two interacting proteins detected in an experimental study have the same cellular location and/or have similar biological functions, it is more probable that the interaction between these two proteins actually occurs in living cells. Based on the idea of the in silico structure-based drug design, our novel and integrative in silico system discovers candidates for drug-targetable PPIs satisfying the above-mentioned criteria by integrating three independent assessment procedures: • detection of protein domains responsible for PPIs, • finding SDC-binding pockets on protein surfaces, • evaluating similarities in the assignment of GO terms between specific partner proteins. The in silico system is schematically represented in Figure Figure1.1
Application of our system to the previously-investigated target PPIs We conducted the three in silico analyses on the 15 previously-investigated target PPIs in [1,4]; AMAP1/cortactin [9], B7.1/CD28 [10], BAK/BCL2(BCL-XL) [11-14], β-catenin/Tcf4 [15,16], CCR5/Env [41], CD4/MHC class II [42], CRM1/Rev [43], EPO/EPOR [44], IL1α (IL1β)/IL1R type I [45], IL2/IL2Rα [17,18], iNOS/iNOS [46], LFA1/ICAM1 [19-21], Myc/Max [47], NGF/p75NTR [22], and p53/MDM2 [6-8]. Table 1 summarizes the results (see Additional file 1 for the full results of the analyses). As shown in Additional file 1, all proteins in the target PPIs have one or more Pfam-A and/or Pfam-B domains. By searching the public domain-domain interaction databases, iPfam [48], InterDom [49], and DIMA [50], we identified interacting partner domains in most of the target PPIs (Table 1). We found one or more pockets on at least one of the two interacting proteins in most target PPIs. Evaluation of similarity scores for GO-term assignment indicates that many target PPIs have statistically significant (P < 0.05) scores in two out of the three GO categories, cellular component, molecular function, and biological process. Taken together, we adopted the following thresholds in the three assessment procedures of our system.
• A domain pair in the PPIs has been already known or predicted as interacting partner in the public databases. • One or both proteins have at least one pocket on the protein surface to which SDCs can bind. • Similarity score for the GO-term assignment is statistically significant (P < 0.05) in two out of the three GO categories. By adopting the thresholds, our system can select 8 PPIs (BAK/BCL2(BCL-XL), β-catenin/Tcf4, CD4/MHC class II, IL1α(IL1β)/IL1R type I, iNOS/iNOS, LFA1/ICAM1, NGF/p75NTR, and p53/MDM2) from the 15 previously-investigated target PPIs. In addition, the locations of the pockets found on the 8 PPIs are in good agreement with those of pockets targeted by SDCs in the previous studies (data not shown). Thus, we consider the thresholds to be suitable for assessing drug-targetability of each PPI, although some PPIs may be missed as false negatives. Application to original human PPI data Most PPIs in original human PPI data are those between human transcription factors (baits) and other proteins (preys) (see Additional file 2). The number of unique baits and preys are 99 and 738, respectively (Table 2). The baits and preys used in our HTS-Y2H assays were sequence fragments. Protein domains included in the bait and prey fragments are likely involved in the interaction between the two fragments. All domains in the bait and prey fragments used in the present study were retrieved from the Pfam database (see Methods). We identified Pfam-A and/or Pfam-B domains in most of the bait (98% (97/99)) and prey (97% (714/738)) fragments (Table 2). Table 3 indicates that in most (95% (734/770)) bait-prey pairs, both fragments have Pfam-A and/or Pfam-B domains. This table also shows that only 3% (23/770) of bait-prey pairs satisfy the first criterion of our system, dramatically reducing candidate PPIs. Then, we further identified two domains as interacting partner domains, when a single domain was present in the bait fragment and a single domain in the prey fragment. Among the bait and prey fragments with domains, 32 (33%) bait and 350 (49%) prey fragments have a single domain. In 62 (8%) out of the 734 bait-prey pairs, we detected a single domain in both the bait and the prey fragments. As a result, we identified interacting partner domains in 83 (11%) bait-prey pairs. It is highly probable that these domain pairs are involved in the interaction between the bait and prey fragments. See Additional file 2 for the full list of the detected domains in the fragments.
In order to computationally detect pockets on the surfaces of domains/proteins in the bait and prey fragments, it is essential that tertiary structures nearly identical to the bait and prey fragments are available. To detect protein tertiary structures nearly identical to the fragments, we searched for entries in the PDB [51] database showing high amino acid sequence identity and sequence coverage rate to the fragments (see Methods). The rigorous threshold of sequence identity ≥ 90% and coverage rate ≥ 90% in the results of sequence-similarity searches was adopted in the present study. This is because we detected pockets based on their volume and the number of hydrophobic amino acid residues in pockets, and these pocket properties are very sensitive to a slight conformational change of protein tertiary structure caused by amino acid replacement, deletion, or insertion. If sequence identity between a bait or prey fragment and a PDB entry fell within the range of 50%–90%, one could reconstruct a tertiary structure of the protein with homology modeling based on the template structure of the PDB entry. In these situations, however, pocket properties on the reconstructed tertiary structure would be not always nearly identical to those on the template structure. Therefore, we adopted the rigorous threshold of sequence identity ≥ 90% and coverage rate ≥ 90% for pocket detection. Results of the sequence-similarity search indicate that 15% (15/99) of bait and 7% (51/738) of prey fragments have nearly identical tertiary structures in the PDB database (Table 2). Most of the bait and prey fragments (100% (15/15) in bait, 84% (43/51) in prey) have one or more pockets on their protein surface. Table 3 shows that one or both fragments in 27% (211/770) of bait-prey pairs have nearly identical tertiary structures. In 96% (203/211) of the bait-prey pairs, we found SDC-binding pockets in one or both fragments. See Additional file 2 for the full results of the pocket analyses. GO [52] is useful for assessing the biological significance of the bait-prey pairs and for selecting well-studied pairs. This is due to the hierarchical data structure of GO in which many biological terms are highly systematically organized to allow the computational handling of many terms related to biology. We counted the numbers of shared identical GO terms and calculated similarity scores between the bait and prey fragments (see Methods). Table 2 shows that most bait proteins (> 90%) and many prey ones (> 80%) have at least one GO term in any of the three GO categories. Table 3 indicates that many bait-prey pairs (> 75%) share one or more identical GO terms. We calculated similarity scores and evaluated statistical significance of the scores based on frequency distributions of scores calculated for PPI data composed of random protein pairs (see Additional file 3). The number of bait-prey pairs with a statistically significant (P < 0.05) score is shown in Table 3. Among these pairs, 201 bait-prey pairs have the statistically significant scores in two out of the there GO categories. See Additional file 2 for similarity scores calculated for all bait-prey pairs and results of the statistical evaluation of these scores. Among the 770 unique bait-prey pairs, we selected candidates for drug-targetable PPIs that satisfy all the three criteria. As shown in Table 3, 83 bait-prey pairs satisfied the first criterion. The number of bait-prey pairs satisfying the second or third criterion was 203 or 201, respectively. Figure Figure22
Discussion Drug-targetability of selected PPIs In this section, we discuss the drug-targetability of the two candidate PPIs, retinoid × receptor α (RXRA)/nuclear receptor-interacting protein 1 (NRIP1) and cell division protein kinase 2 (CDK2)/cyclin-dependent kinase inhibitor 1 (CDKN1A) (Table 4). The two candidates were selected, because both bait and prey fragments had a single domain, and interacting partner domains were explicitly determined, and because similarity scores for GO-term assignment were statistically significant in all the three GO categories. We further examined the two candidates with respect to their biological roles, PPI network around each candidate, and tertiary structures of the interacting domains.
RXRA/NRIP1 Biological functions of RXRA and NRIP1 have been studied in detail [53-56]. The statistically significant similarity scores for the GO-term assignment indicate that RXRA and NRIP1 have related biological functions (Table 4). In fact, the two proteins share a number of gene-transcription-related GO terms; 'nucleus' in the cellular component category, 'transcription coactivator activity' and 'DNA binding' in the molecular function category, and 'regulation of transcription, DNA-dependent' and 'positive regulation of transcription from RNA polymerase II promoter' in the biological process category. RXRA is a member of the nuclear hormone receptor family. When a ligand binds to its hormone receptor domain, RXRA forms a homo- or hetero-dimer with other nuclear hormone receptors in order to function as a transcription factor [56]. NRIP1 interacts with homo- or hetero-dimers of various nuclear hormone receptors and modulates their function by repressing transcriptional activity of the dimers [53-55]. Figure Figure33
We identified interaction between the Hormone_recep domain (ligand-binding domain) [Pfam:PF00104] in RXRA and a fragment of the PB064381 domain containing LXXLL motifs in NRIP1 (Table 4). The RXRA/NRIP1 interaction is believed to occur between α-helix 12 (H12) located in the C-terminal region of the Hormone_recep domain in RXRA and the LXXLL motifs in NRIP1 [54,55]. Since RXRA interact with NRIP1 in a ligand-dependent manner [53-55], one would expect to detect pockets on the surface of RXRA in the ligand-bound state. 1LBD in Table 4, however, is not suitable for the present study because it is the tertiary structure of RXRA homo-diners in the non-ligand-bound state. Then, we further detected pockets on 1MVC_A (RXRA in the ligand-bound state) with the second-highest score to the bait fragment from RXRA in the sequence similarity search. Figure 4(a)
CDK2/CDKN1A CDK2 and CDKN1A share several GO terms; 'nucleus' in the cellular component category, 'protein kinase activity' and 'protein binding' in the molecular function category, and 'cell cycle' in the biological process category. This indicates that the both proteins have biological functions in signaling pathways related to cell cycle regulation in the nucleus. CDK2 forms a protein complex with a member of cyclin family proteins, and functions in cell cycle progression at the transition between the G1 and S phases [60]. CDKN1A arrests cell cycle progression by acting as an inhibitor of CDK2/cyclin protein complex [61]. The PPI network illustrated in Figure Figure33 We identified domain-domain interaction between the Pkinase domain [Pfam:PF00069] in CDK2 and the CDI domain [Pfam:PF02234] in CDKN1A (Table 4). This is in good agreement with the results in the previous studies [66] identifying interaction interface of CDK2/CDKN1A. One strategy for inducing or stabilizing a PPI is to design a SDC that can simultaneously bind to a pocket laid across two interacting proteins on a protein complex. In the case of CDK2/CDKN1A, we found pockets on the Pkinase domain [PDB:1V1K_A] in CDK2 but did not detect any pocket on the CDI domain in CDKN1A because it has no nearly identical tertiary structure (Table 4). Instead of 1V1K_A, we further investigated a tertiary structure of protein complex [PDB:1JSU] composed of CDK2, cyclin A, and CDKN1B that is a homolog of CDKN1A (sequence identity < 45%). Figure 4(c) Advantages of targeting PPIs Targeting PPIs has distinct advantages over targeting single proteins; a larger number of undiscovered potential drug targets. Using traditional approaches for drug target discovery from the human proteome, drug targets were single proteins and limited to a small number (~480) of proteins such as membrane receptors and enzymes [70]. Furthermore, most pockets targeted by small chemical drugs in these approaches were those to which endogenous small molecule ligands or substrates bind. By focusing on PPIs, the number of latent and novel drug targets can be expected to dramatically increase. This is because the size of the human interactome must be considerably larger than that of the human proteome and because many pockets involved in PPIs but not targeted in the traditional approaches become accessible. Since the total number of proteins encoded on the human genome is about 25,000 – 40,000, the size of the human interactome has been estimated to be 40,000 – 200,000 PPIs, based on extrapolation from the yeast interactome (10,000 – 30,000 PPIs (3 – 10 interactions/protein)) [71]. However, the number of human PPIs, registered in the public interaction database, is limited to ~38,000 [57]. Therefore, it is highly probable that most PPIs, including those which could be potential drug targets in the human interactome, remain undiscovered. For example, some PPIs, including BAK/BCL2, BAK/BCL-XL, p53/MDM2, and homo- or hetero-dimers of nuclear receptors, are mediated by hydrophobic grooves formed by three α-helices [1,56]. These PPIs utilizing α-helix grooves are thought to be amenable to small-molecule drug discovery [1], and thus may be promising targets of PPI-inhibiting SDCs [1,5]. Our in silico system can select more reliable interactions as drug targets by excluding spurious interactions via the three independent assessment procedures. PPI data used in the present study were obtained from our HTS-Y2H assays. In general, the false positive rate of HTS-Y2H methods has been believed to be higher than that of other physical, genetic, biochemical, or immunological methods for experimental detection of PPIs, mainly due to 'sticky' proteins that non-specifically interact with various proteins [72]. While a recent study on PPI prediction by the Support-Vector-Machine-based method has implied that PPI data produced by our HTS-Y2H assays are more reliable than data in the previous HTS-Y2H studies (Table 4 in [73]), we do not neglect the possibility that our PPI data also contain false positive interactions. Indeed, our HTS-Y2H assays identified PPIs between baits derived from nucleus-located proteins and preys from extracellular proteins such as collagen α-1(XV) chain (COL15A1), extracellular matrix protein 1 (ECM1), and laminin proteins (LAMA3, LAMB3, and LAMC2) (see Additional file 2). These PPIs are highly probable to be false positives. Our in silico system, however, can exclude these spurious interactions, because, in these cases, similarity scores for GO-term assignment are not statistically significant in the cellular component category. Therefore, our approach should be widely applicable to PPI data even if a number of false positive interactions are included. Issues in out approach Our approach has some advantages described above, but some issues should be noted for further refinement of the approach. For more careful assessment of domain detection, we did not identify interacting partner domains when bait and/or prey fragments have multiple domains, so long as a domain pair was not registered in the public domain-domain interaction databases. However, a large number of human proteins are multi-domain ones, and this is also the case in the bait (> 60%) and prey (> 45%) fragments used in the present study. Several computational methods have been developed in recent years for predicting interacting partner domains from large amounts of experimental PPI data [74-80]. Application of the methods to the PPI data used in this study will be needed for more exhaustive identification of interacting domains. For the purpose of pocket detection, we adopted simple criteria mainly based on pocket volume and the number of amino acid residues composing the pocket. Many studies in past few decades have revealed various properties of pockets involved in endogenous ligand binding or PPI [[37,81-83] and references therein]. These properties, such as volume, shape, hydrophobic clusters, shallowness, roughness, and accessible surface area, can be taken into consideration as parameters for assessment of drug-targetability of each pocket. We are now developing a computer program that evaluates drug-targetability of pockets based on these parameters. The program will enable us to judge whether a pocket is suitable for drug target. To investigate whether biological function of each PPI has been well understood or not, we assessed each PPI by using GO terms. GO has been frequently used in PPI network studies for researchers' purpose of annotating biological function of PPIs [28-32,34], but it has also a weak point that well-studied proteins have many GO terms and poorly-understood ones have little. While PPIs between well-studied proteins have been annotated too much, those between poorly-understood ones too little. Thus, when our approach assesses PPIs by using GO terms, it may miss poorly-understood but therapeutically important target PPIs as false negatives. But, one of the aims of our system is to select PPIs on which biological information are more abundant. In vivo and in vitro validation process of PPIs as drug target, it is more desirable that a researcher can obtain as much information as possible on biology of the PPIs. Since PPIs annotated too little are considered as difficult target in this respect, our system does not select the PPIs in this study. More accumulation of GO annotation will help us select therapeutically important target PPIs that are annotated too little by GO terms at present. Future directions Our in silico system can be further expanded for more precise assessment of candidates for drug-targetable PPIs if other computational methods are incorporated. These methods include the prediction of interaction interfaces on protein tertiary structures, the prediction of disordered regions, and the evaluation of similarities in the expression patterns of messenger RNAs encoding the two interacting proteins in every tissue/organ. In the case of RXRA/NRIP1 and CDK2/CDKN1A, it is fortunate that the interaction interfaces have been well studied by biochemical and immunological approaches [54,55,66], although the tertiary structures of the protein complexes remain unsolved. However, if the interaction interface of a candidate target PPI has not been well studied and the tertiary structure of the protein complex is unknown, computational methods to predict the PPI interface [84-88] are required in order to determine whether a detected SDC-binding pocket is located at the interface. Cheng and colleagues [89] recently proposed that interaction interface regions in proteins tend to have disordered tertiary structures and that information regarding these disordered regions is useful for drug target discovery. As for gene expression patterns, two proteins could presumably interact in living cells, if the expression patterns of their corresponding genes were similar to each other. We focused on discovering drug targets for SDCs based on the idea of the structure-based in silico drug design, although there are various other types of drugs, including peptides, antisense RNAs or DNAs, aptamers, and antibodies. Candidate target PPIs for each type of drugs, as well as small chemical drugs, will be selected by adopting distinct criteria based on the three (or more) independent in silico investigations in our system. For example, to select candidate target PPIs for antibodies, one can adopt criteria so that i) at least one tertiary structure of the interacting domains is known, ii) the interacting domain has an interaction interface predicted to be recognized by antibodies, and iii) the interacting proteins share identical GO terms such as 'extracellular' in the cellular component category and have expression patterns similar to each other. Conclusion In this paper, we propose a novel and integrative in silico approach for discovering candidates for drug-targetable PPIs in interactome data. The system excludes false positive interactions and selects more reliable PPIs as drug targets. The application of our system to original human PPI data demonstrated its effectiveness by discovering the six promising candidates for drug-targetable PPIs. Advances in HTS technologies for detecting PPIs and the accumulation of high fidelity PPI data in the near future will enable our system to facilitate the more comprehensive exploration of drug-targetable PPIs. Methods PPI data The PPI data analysed in the present study consists of 770 binary interactions between human proteins. The data were produced by our HTS-Y2H assays supported by the Genome Network Project from the Ministry of Education, Culture, Sports, Science and Technology of Japan. See Additional file 2 and the website of the Genome Network Platform [90] for all PPI data used in this study. Most of bait proteins used in the HTS-Y2H assays are transcription factors, including members of the nuclear hormone receptor family (NR1D1, NR1D2, PPARA, PPARD, RORB, RXRA, THRA, etc), those of the Signal Transducer and Activator of Transcription (STAT) family (STAT1, STAT3, and STAT4), homeodomain proteins (FOXP2, LHX1, LHX2, PKNOX1, etc), and zinc-finger proteins (RFP, ZNF31, ZNF581, TRIM21, etc). Preys used in the assays were prepared from cDNA libraries derived from various cell lines (brain, breast cancer/prostate cancer, liver, and macrophage). Our HTS-Y2H method uses sequence fragments as baits, and preys isolated with the baits are also sequence fragments. This enables us to identify protein domains responsible for PPIs because it is highly probable that protein domains included in the bait or prey fragments are involved in the interactions between the two fragments. Full details of our HTS-Y2H method, including experimental materials and conditions, will be reported elsewhere in near future. Detection of protein domains responsible for PPIs All domains in the bait and prey fragments were retrieved from the Pfam (version 20.0) database [38] using the UniProt (release 50.3) or TrEMBL (release 33.3) database [91] accession numbers associated to the fragments. When no domain was detected in a bait or prey fragment, the bait or prey fragment was further searched for Pfam domains to profile Hidden Markov Models of the Pfam-A and Pfam-B domains using the program HMMPFAM [92]. The HMMPFAM search was performed with the default program parameters except for '-E 0.1 – domE 0.1' (E-value < 0.1 for each detected domain). If the sequence length of a detected domain included in a fragment was < 10 residues, the domain was excluded in the following studies. To check whether a domain pair has been known or predicted as interacting partner in previous studies, all combinations of domains between bait and prey fragments were searched for the public domain-domain interaction databases, iPfam [48], InterDom version 1.1 [49], and DIMA [50]. Finding SDC-binding pockets on protein surfaces Using amino acid sequences of the bait and prey fragments as queries, we searched the PDB database [51] (the version at the date of 2006/5/18) for tertiary structures similar to each fragment using the program BLASTP (version 2.2.13) [93]. This similarity search was performed with the default program parameters except for '-F F' (no mask for low complexity regions) and '-e 0.001' (E-value < 0.001). We considered the fragment to have a tertiary structure nearly identical to the chain, when a bait or prey fragment had sequence identity of ≥ 90% and query coverage rate (length of query sequence showing the identity/full length of the query sequence) of ≥ 90% to a chain in a PDB entry, and if the sequence length showing the identity was ≥ 50 residues. If no nearly-identical tertiary structure was detected for a fragment, the fragment was further searched in the PDB database using the program PSI-BLAST (version 2.2.13) [93]. The default program parameters were used for the PSI-BLAST search except for '-j 10' (10 times the iteration search). The search for pockets on protein surfaces was performed for the bait and prey fragments showing high sequence identity (≥ 90%) to a chain in a PDB entry. We used two programs, CASTp [39] and MOE Alpha Site Finder [40], which implement different pocket-search algorithms. Coordinate data for the chains in the PDB showing high sequence identity to the bait and prey fragments were used as input to the programs. We counted the number of pockets satisfying the following empirically-determined criteria in order to detect potential SDC-binding pockets: in the case of CASTp, i) the volume (v) of a detected pocket was within the range of 150Å3 <v ≤ 2000Å3; ii) in that of MOE Alpha Site Finder, a) the number of atoms comprising the side chains of the amino acids inside the pocket was ≥ 37 or b) the number of hydrophobic atoms inside the pocket was ≥ 22. Evaluating similarities in the assignment of GO terms between specific partner proteins Based on GO terms assigned to two proteins from which the bait and prey fragments were derived, we evaluated similarities between fragments by counting the number of shared identical GO terms. GO terms assigned to the proteins were retrieved from the QuickGO database [94] using the UniProt/TrEMBL accession numbers. GO organizes a wide variety of biological terms as hierarchy. If a specific term is assigned to a gene product, then all 'parent' terms in all paths ascending from that specific term to the top level terms ('cellular component', 'biological process', and 'molecular function') of the hierarchy are also assigned to that gene product [96]. Thus, we collected all parent terms of specific ones assigned to each protein. A similarity score (Si) between a protein pair i is calculated as where Lj is the jth level of GO hierarchy (in the present study, Lj = 1, 2, 3, ..., 13, from the top level term (Lj = 1) to a specific term (Lj > 1)) and nij is the number of shared identical GO terms in the jth level between a protein pair i. We calculated the scores for the three GO categories; cellular component (SiC), molecular function (SiF), and biological process (SiP). Statistical significance of the similarity scores was evaluated on the basis of frequency distributions of scores calculated for PPI data composed of 10,000 random pairs of human proteins (see Additional file 3). The random pairs were constructed from proteins in the UniProt and TrEMBL database with GO terms. The frequency distributions of random scores were calculated for all three GO categories, and probabilities of the real scores were estimated based on the distributions. Abbreviations PPI, protein-protein interaction; HTS, high-throughput screening; SDC, small drug-like chemical; GO, Gene Ontology; HTS-Y2H, high-throughput screening yeast two-hybrid. Authors' contributions NS conceived of the study, carried out the studies on domain detection and gene ontology, and drafted the manuscript. KI and TTashiro carried out the protein structure and pocket studies. ST, JO, YI, AS, AT, HN, TTakeda, and TI designed and carried out the HTS-Y2H assays. SK and YS conceived and supervised this study. All authors read and approved the final manuscript. Additional file 1 Full results of our analyses of the previously-investigated target PPIs. This file lists the previously-investigated target PPIs and summarizes the full results of domain detection, finding SDC-binding pockets, and evaluating similarities in GO-term assignment. Click here for file(29K, xls) Additional file 2 Full results of our analyses of original human PPI data. This XLS-format file lists original human PPIs analysed in the present study and summarizes the full results of domain detection, search for nearly identical tertiary structures and finding SDC-binding pockets, and evaluating similarities in GO-term assignment. Click here for file(447K, xls) Additional file 3 Frequency distributions of similarity scores for GO-term assignment calculated for random protein pairs. This file contains a figure illustrating frequency distributions of similarity scores for GO-term assignment calculated for PPI data composed of 10,000 random pairs of human proteins. Click here for file(24K, pdf) Additional file 4 PPI network of original human PPI data. This file is an original version of the PPI network in Figure Figure3.3 Click here for file(560K, pdf) Acknowledgements We would like to thank Yoshinori Harada for helpful comments on the manuscript. This work was supported by a research grant for the Genome Network Project from the Ministry of Education, Culture, Sports, Science and Technology of Japan. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||
J R Soc Interface. 2006 Apr 22; 3(7):215-33.
[J R Soc Interface. 2006]Science. 2004 Feb 6; 303(5659):844-8.
[Science. 2004]Proc Natl Acad Sci U S A. 2006 Feb 7; 103(6):1888-93.
[Proc Natl Acad Sci U S A. 2006]Mol Cancer Res. 2004 Jan; 2(1):20-8.
[Mol Cancer Res. 2004]Proc Natl Acad Sci U S A. 2006 May 2; 103(18):7036-41.
[Proc Natl Acad Sci U S A. 2006]Science. 2004 Feb 6; 303(5659):844-8.
[Science. 2004]J Pharmacol Exp Ther. 1999 Jun; 289(3):1271-6.
[J Pharmacol Exp Ther. 1999]Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4569-74.
[Proc Natl Acad Sci U S A. 2001]Mol Syst Biol. 2007; 3():89.
[Mol Syst Biol. 2007]Drug Discov Today. 2005 Aug 15; 10(16):1111-7.
[Drug Discov Today. 2005]J Mol Biol. 1998 Jul 3; 280(1):1-9.
[J Mol Biol. 1998]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D247-51.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2003 Jul 1; 31(13):3352-5.
[Nucleic Acids Res. 2003]Curr Opin Chem Biol. 2004 Aug; 8(4):442-9.
[Curr Opin Chem Biol. 2004]Proc Natl Acad Sci U S A. 2006 May 2; 103(18):7036-41.
[Proc Natl Acad Sci U S A. 2006]J Biol Chem. 2002 Mar 1; 277(9):7363-8.
[J Biol Chem. 2002]J Med Chem. 2001 Dec 6; 44(25):4313-24.
[J Med Chem. 2001]Angew Chem Int Ed Engl. 2003 Feb 3; 42(5):535-9.
[Angew Chem Int Ed Engl. 2003]Nucleic Acids Res. 2000 Jan 1; 28(1):235-42.
[Nucleic Acids Res. 2000]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D322-6.
[Nucleic Acids Res. 2006]Mol Endocrinol. 1998 Jun; 12(6):864-81.
[Mol Endocrinol. 1998]J Mol Endocrinol. 2003 Aug; 31(1):1-7.
[J Mol Endocrinol. 2003]Genome Res. 2003 Oct; 13(10):2363-71.
[Genome Res. 2003]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D668-72.
[Nucleic Acids Res. 2006]J Biol Chem. 2001 Mar 2; 276(9):6695-702.
[J Biol Chem. 2001]J Mol Endocrinol. 2003 Aug; 31(1):1-7.
[J Mol Endocrinol. 2003]Mol Endocrinol. 1998 Jun; 12(6):864-81.
[Mol Endocrinol. 1998]J R Soc Interface. 2006 Apr 22; 3(7):215-33.
[J R Soc Interface. 2006]Cell Cycle. 2004 Jan; 3(1):35-7.
[Cell Cycle. 2004]Bioessays. 1998 Dec; 20(12):1020-9.
[Bioessays. 1998]Structure. 2003 Dec; 11(12):1537-46.
[Structure. 2003]J Clin Oncol. 2006 Apr 10; 24(11):1770-83.
[J Clin Oncol. 2006]Cell Cycle. 2004 Jun; 3(6):742-6.
[Cell Cycle. 2004]Oncogene. 1996 Feb 1; 12(3):595-607.
[Oncogene. 1996]Nature. 1996 Jul 25; 382(6589):325-31.
[Nature. 1996]Science. 2000 Mar 17; 287(5460):1960-4.
[Science. 2000]Curr Opin Struct Biol. 2004 Jun; 14(3):292-9.
[Curr Opin Struct Biol. 2004]Genome Res. 2003 Oct; 13(10):2363-71.
[Genome Res. 2003]J R Soc Interface. 2006 Apr 22; 3(7):215-33.
[J R Soc Interface. 2006]J Mol Biol. 2003 Apr 11; 327(5):919-23.
[J Mol Biol. 2003]In Silico Biol. 2006; 6(6):515-29.
[In Silico Biol. 2006]J Mol Biol. 2001 Aug 24; 311(4):681-92.
[J Mol Biol. 2001]Genome Biol. 2006; 7(11):R104.
[Genome Biol. 2006]J Mol Biol. 1998 Jul 3; 280(1):1-9.
[J Mol Biol. 1998]Proc Natl Acad Sci U S A. 1996 Jan 9; 93(1):13-20.
[Proc Natl Acad Sci U S A. 1996]J Med Chem. 2005 Apr 7; 48(7):2518-25.
[J Med Chem. 2005]J Biol Chem. 2001 Mar 2; 276(9):6695-702.
[J Biol Chem. 2001]J Mol Endocrinol. 2003 Aug; 31(1):1-7.
[J Mol Endocrinol. 2003]Oncogene. 1996 Feb 1; 12(3):595-607.
[Oncogene. 1996]Bioinformatics. 2005 Apr 15; 21(8):1487-94.
[Bioinformatics. 2005]Bioinformatics. 2006 Jul 15; 22(14):1794-5.
[Bioinformatics. 2006]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D247-51.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2007 Jan; 35(Database issue):D193-7.
[Nucleic Acids Res. 2007]Bioinformatics. 2005 Feb 1; 21(3):410-2.
[Bioinformatics. 2005]Nucleic Acids Res. 2003 Jan 1; 31(1):251-4.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 2000 Jan 1; 28(1):235-42.
[Nucleic Acids Res. 2000]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Nucleic Acids Res. 2003 Jul 1; 31(13):3352-5.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D514-7.
[Nucleic Acids Res. 2005]