![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2008 Chu and Chen; licensee BioMed Central Ltd. Construction of a cancer-perturbed protein-protein interaction network for discovery of apoptosis drug targets 1Lab of Control and Systems Biology, National Tsing Hua University, Hsinchu, 300, Taiwan Corresponding author.Liang-Hui Chu: chulianghui/at/gmail.com; Bor-Sen Chen: bschen/at/ee.nthu.edu.tw Received February 11, 2008; Accepted June 30, 2008. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background Cancer is caused by genetic abnormalities, such as mutations of oncogenes or tumor suppressor genes, which alter downstream signal transduction pathways and protein-protein interactions. Comparisons of the interactions of proteins in cancerous and normal cells can shed light on the mechanisms of carcinogenesis. Results We constructed initial networks of protein-protein interactions involved in the apoptosis of cancerous and normal cells by use of two human yeast two-hybrid data sets and four online databases. Next, we applied a nonlinear stochastic model, maximum likelihood parameter estimation, and Akaike Information Criteria (AIC) to eliminate false-positive protein-protein interactions in our initial protein interaction networks by use of microarray data. Comparisons of the networks of apoptosis in HeLa (human cervical carcinoma) cells and in normal primary lung fibroblasts provided insight into the mechanism of apoptosis and allowed identification of potential drug targets. The potential targets include BCL2, caspase-3 and TP53. Our comparison of cancerous and normal cells also allowed derivation of several party hubs and date hubs in the human protein-protein interaction networks involved in caspase activation. Conclusion Our method allows identification of cancer-perturbed protein-protein interactions involved in apoptosis and identification of potential molecular targets for development of anti-cancer drugs. Background Study of interactome, the entire set of molecular interactions within cells, has provided many insights into the etiology and regulation of cancer [1,2]. Tumorigenesis is a multi-step process caused by genetic alterations that drive the progressive transformation of normal cells into malignant cells. At the molecular level, genetic mutations, translocations, amplifications, deletions, and viral gene insertions can alter translated proteins and thereby disrupt signal transduction pathways and protein-protein interactions that are essential for apoptosis and other important cellular processes [3]. Inactivation of pro-apoptotic proteins or up-regulation of anti-apoptotic proteins results in unchecked growth of cells and ultimately to cancer [4]. From a systems biology perspective, cancer is mainly caused by malfunctions of perturbed protein interaction networks in the cell [5,6]. Apoptosis is necessary for normal human development and survival, in that cells must die in order to prevent uncontrolled growth [7]. Apoptosis requires activation of multiple pathways via regulated protein-protein interactions [8]. Apoptosis is mediated by an intrinsic pathway, which is triggered by "death stimuli" (e.g., DNA damage, oncogene activation, among others) within a cell, or by an extrinsic pathway, which is initiated by binding of an extracellular "death ligand". The extrinsic pathway can link to the intrinsic pathway, which then triggers the release of mitochondria proteins via protein-protein interactions [4,8,9]. During apoptosis, several proteins are released from the intermembrane space of the mitochondria into the cytoplasm and these proteins activate initiator caspases and trigger a series of protein-protein interactions in the caspase cascade. Evading apoptosis is one of the six acquired capabilities of cancer cells [3], and anticancer treatment using cytotoxic drugs is considered to mediate cell death by activating key elements of the apoptosis program and the cellular stress response [10]. Comprehensive knowledge of protein-protein interactions provides a framework for understanding the biology of cancer as an integrated system [11]. Most gene products mediate their functions within complex networks of interconnected macromolecules, forming a dynamic topological interactome [11,12]. High throughput two-hybrid experiments [13,14] and several online interactome databases, such as BIND [15], HPRD [16], Intact [17], and Himap [18], allow analysis of the global topologies of human protein-protein interactions. BIND [15] is a database designed to store full descriptions of interactions, molecular complexes and pathways. HPRD [16] provides detailed data including protein sequences, localization, domains, and motifs, and thousands of protein-protein interactions, with other data. Intact [17] contains an enrichment of protein-protein interactions, related literature, and experimental detail. Himap [18] combines two datasets of yeast-two-hybrid experiments [13,14] to form a human protein reference database [16], with references to functions and predictions. However, experimental and database approaches often yield "false-positives" [19]. For example, yeast two-hybrid experiments based on transactivation of reporter genes require the presence of auto-activators, where the bait activates gene expression in the absence of any prey [11]. The yeast two-hybrid technique can yield false-positives (spurious interactions detected because of the high-throughput nature of the screening process), and false-negatives (undetected interactions) [19,20]. Computational methods can refine protein-protein interaction networks and result in fewer false-positives [21,22]. Because of the complex nature of interactomes, such as that observed in the apoptosome complex during caspase formation [7,8,23], a nonlinear mathematical model provides better characterization than a linear model [24,25]. In addition, a stochastic model allows consideration of intrinsic and extrinsic molecular "noise" that causes stochastic variations in transcription and translation [24]. In this paper, we describe a nonlinear stochastic model that characterizes dynamic protein-protein interaction networks of apoptosis in cancerous and normal cells. In this study, we built an initial protein-protein interaction network based on two human yeast two-hybrid data sets [13,14] and four online interactome databases such as BIND [15], HPRD [16], Intact [17], and Himap [18]. Next, we constructed a nonlinear stochastic model of dynamic protein-protein interactions to eliminate false-positives from the network by applying a statistical method (Akaike Information Criterion, AIC) to the high-throughput protein interaction data. We regard all proteins in an organism as a large dynamic interaction system. Protein-protein interactions are considered as nonlinear stochastic processes with several expression profiles of interactive protein partners as input, and the expression profile of a target protein as output. Because of random noise and uncertainties during experiments, we describe protein-protein interactions with stochastic discrete nonlinear dynamic equations. We considered linear individual (or binary) protein interactions and nonlinear cooperative protein complex interactions, but not DNA-protein or metabolite interactions. First, we constructed protein-protein interaction networks of apoptosis in HeLa (human cervical carcinoma) cells and normal primary human lung fibroblasts based on microarray data [26]. Next, we obtained the cancer-perturbed protein-protein interaction network by comparison of apoptosis in normal cells via gain-of-function and loss-of-function networks. Because current drugs designed to induce apoptosis kill cancer cells as well as normal cells, these cancer-perturbed protein-protein interaction networks allow identification of potential selective targets of apoptosis-promoting drugs [5]. Results and Discussion Construction of the cancer-perturbed protein-protein interaction network of apoptosis Initially, we selected proteins that are known to have roles in apoptosis and considered them as the "core nodes" of our network. These included BAX (BCL2-associated X protein), BCL2 (B-cell CLL/lymphoma 2), BID (BH3 interacting domain death agonist), CASP3 (caspase-3), BIRC4 (baculoviral IAP repeat-containing 4), CASP9 (caspase-9), CYCS (cytochrome c, somatic), and DIABLO (diablo homolog, Drosophila). Networks, such as ours, that are developed from initially selected genes or proteins as the core nodes are referred to as "BRAC-centered networks" [27]. Our initial apoptosis network contained 207 protein nodes and 841 protein-protein interaction edges. From equations (1) to (13) (see "Methods"), we calculated each protein interaction twice, with each partner considered as the target protein. The final results are shown in Figure Figure11
Supplementary Table 1 compares the networks of apoptosis in HeLa cells and normal primary human lung fibroblasts. These data show, for example, that BAX and PEG3 interact in both cell types, that BAX and CCND1 interact in neither cell type, that BAX and RARG interact in normal cells but not in cancerous cells, and that BAX and BCL2L1 interact in cancerous cells but not in normal cells. In order to identify drug targets for anti-cancer drugs, it is important to identify cancer-perturbed protein-protein interaction networks to identify drug targets to kill cancer cells [3]. If an interaction is absent in normal cells, but present in cancer cells, we call it "gain-of-function"; if an interaction is present in normal cells but not in cancerous cells, we term it "loss-of-function". For the 841 interactions that we identified, we classified 157 (18.7%) as "gain-of-function" and 162 (19.3%) as "loss-of-function" (Figs (Figs2A2A
Figure Figure22 In order to confirm the topology of our networks (Fig. (Fig.2),2 Cancer-perturbed apoptosis mechanism at the systems level In many cancers, pro-apoptotic proteins are inactivated or anti-apoptotic proteins are upregulated, leading to unchecked growth and an inability to respond to cellular stresses [4]. These gain- and loss-of-function mutations lead to aberrations in protein-protein interaction networks. An integration of interactome data and genomic data can provide a clearer understanding of the functional relationships that underlie apoptosis and other biological processes [11,21]. The results of our investigation of the apoptosis mechanism at the systems level and the elucidation of cancer-perturbed protein-protein interaction network topology are depicted in Figs. Figs.2A2A Extrinsic pathway, intrinsic pathway and crosstalk Members of the death receptor superfamily trigger the extrinsic apoptosis pathway upon recruit of caspase-8 through the adaptor protein FAS (TNFRSF6)-associated death domain (FADD) [7]. Binding of a "death ligand" to a receptor triggers formation of a signaling complex that activates caspase-8 or caspase-10, which then activates caspase-3, and finally promotes cell death [23]. Extracellular and intracellular stress triggers the intrinsic apoptosis pathway (mitochondria pathway), which involves activation of pro-apoptotic members of the Bcl-2 family [8]. Bid, a pro-apoptotic member of the Bcl-2 family, allows crosstalk between the extrinsic and intrinsic pathways. The three subfamilies of Bcl-2-related proteins are the anti-apoptotic proteins (e.g., BCL2 and BCL2L1), the pro-apoptotic multi-domain proteins (e.g., BAX and BAK), and pro-apoptotic BH3-only proteins (e.g., BID and BIM) [9,29]. Of the 19 Bcl-2 proteins or regulators (see "goProcess" in Supplementary Tables 2 and 3), 11 are present in Figs. Figs.2A2A Caspases family and caspases regulators The apoptosis death signal is activated via a series of protease caspases (initiator caspases-2, -8, -9, and -10, and effector caspases-3, -6, and -7) which require activation by proteolysis [8,9]. With the exception of CASP6, which is present only in the gain-of-function network, the caspases (including CASP1, CASP2, CASP4, CASP7, CASP9 and CASP10) occur in the gain-of-function and loss-of-function networks. Seven caspase activators, inhibitors, and regulators (including BAX, TP53, CFLAR, CYCS, CARD4, BIRC4, and DIABLO) are present in both networks (Figs. (Figs.2A2A Regulation of apoptosis at the systems level Besides Bcl-2 and caspases, proteins involved in apoptosis regulation include BIRC3, PTEN, CARD12, MAP3K7, DEDD2, MITF, MALT1, BCL6, NALP1, CRADD, RTN4, PSEN1, IGFBP3, BNIP3L, RARG, CFLAR, TRAF3, TRAF1, MCL1, CARD4, TRAF6, VEGF, BIRC2, FGFR1, PEA15, DEDD, MMP9, HRK, and TP53 (Figs. (Figs.2A2A Apoptosis and cell cycle regulation Proteins generally function as components of complexes that contain other macromolecules, to carry out specific biological processes. Networks of interactions connect different cellular processes [11]. We have identified 30 proteins in the protein-protein interaction network of apoptosis that also participate in cell-cycle regulation. These are PTEN, CDC6, MFN2, PKMYT1, DCC, and E2F1 in Fig. Fig.2A,2A Although Figs. Figs.2A2A Prediction of apoptosis drug targets using cancer-perturbed networks of apoptosis Systems-based drug design is a major application of systems biology [5,25,31]. This method constructs disease-perturbed protein-protein interaction networks and identifies potential drug targets by comparison of the networks of normal and abnormal cells. This contrasts with the traditional approach, which reduces cellular processes to their individual components or signal transduction pathways and targets a specific molecule or signaling pathway. A limitation of this traditional approach is that a single molecule or pathway does not adequately describe most biological systems, including those affected in cancer [25]. By comparisons of the protein-protein interaction networks of normal and cancerous cells derived from microarray data, we can identify potential drug targets through a systems-based approach (Fig. (Fig.33
Thus, we built initial protein-protein interaction networks from large-scale experiments and databases, and then employed each microarray data set of HeLa cells and primary lung fibroblasts to modify these networks. For network modification, we used a nonlinear stochastic model and the Akaike Information Criterion (AIC). We next compared the networks of cancerous and normal cells, derived gain-of-function and loss-of-function networks, and identified protein hubs with high degree of perturbation as potential drug targets. Scale-free networks are extremely sensitive to removal of targeted hubs (i.e., attack vulnerability [11,12,32]), so we summed the degree of perturbation (i.e., connectivity) of each node in the cancer-perturbed network (Supplementary Tables 2 and 3) to obtain these perturbed hubs. Proteins with sum of degree of perturbation ≥ 8 (Table 1) differentiate the cancerous and normal interactomes and are potential drug targets [6,25]. We classified the 17 potential drug targets (Table 1) into six categories: (i) Intrinsic pathway: BCL2, BAX, BCL2L1, BID, and CYCS; (ii) Extrinsic pathway: TNF and TNFRSF6; (iii) Common pathway: CASP3 and CASP9; (iv) Apoptosis regulators: TP53, MYC, CFLAR, and EGFR; (v) Stress-induced signaling: MAPK1 and MAPK3; and (vi) Others: CDKN1A and CCND1. Our results indicate that most proteins interact with few partners, whereas hubs interact with many partners, consistent with current views on interactome networks with a scale-free or power law degree distribution [11].
Intrinsic pathway: BCL2, BAX, BCL2L1, and CYCS Defective apoptosis in human cancers often results from over-expression or inhibition of BCL2 proteins. These proteins regulate mitochondrial permeability by inhibiting (e.g., BCL2 and BCL2L1) or promoting (e.g., BAX and BID) release of cytochrome c (CYCS) [33]. BCL2 and several anti-apoptotic relatives, such as BCL2L1, associate with the mitochondrial outer membrane and the endoplasmic reticulum nuclear membrane and maintain the integrity of these membranes. Initiation of apoptosis requires pro-apoptotic family members that closely resemble BCL2 and distantly related proteins that are related only by the small BH3 protein-interaction domain [29]. In our results, proteins with gain-of-function interactions with BCL2 include CCND1, BAD, MCL1, MAPK3, ADM, KITLG, EGFR, BAG2, PKMYT1, TP53, PCNA, MITF, ABCB1, BCL6, ZNF384, HRK, PPP2R5A, and VEGF (Supplementary Table 2). Proteins with loss-of-function interactions with BCL2 include CDKN1A, TNF, WT1, BAG4, BCL2L14, DEK, GRN, RAF1, BLK, BAG5, CAPN2, GHR, CDKN1B, RTN4, BNIP3L, MAP3K1, and CLC (Supplementary Table 3). We predict BCL2 to be the best potential drug target because this protein best differentiates protein-protein interaction networks of HeLa and normal cells [5,25]. Our analysis agrees with the conclusions of previous studies which showed that BCL2 protein family members are good targets for cancer therapy. Drugs that target BCL2 include Genasense [4,34] and ABT-737 [35]. The activation of Bax can be induced by gene therapy through delivery of Bax vectors, and this approach has been successful in inducing apoptosis in cancer cell lines [4]. Antisense BCL2L1 (BCL-xL) downregulates the expression of BCL2 and BCL2L1, induces apoptosis, and inhibits growth of several tumor types in vitro and in vivo [34]. Unfortunately, targeting of BCL-2 also causes adverse effects, presumably because many normal cells depend on proteins in the BCL-2 family to maintain normal mitochondrial function. Difficulty in using BCL-2 antisense DNA or RNA as a delivery system is a problem with Genasense [36]. Cytochrome c, once released into the cytosol, interacts with Apaf-1 and this leads to activation of caspase-9 proenzymes [34]. The chief function of BCL-2 proteins is to regulate the release of cytochrome c from mitochondria. Thus, targeting CYCS or Apaf-1 would also be expected to cause severe adverse effects [37,38]. Extrinsic pathway and crosstalk: TNF, TNFRSF6, and BID The extrinsic pathway activated by death receptors, such as Fas (TNFRSF6/APO-1/CD95) and other TNF receptor family members, allows apoptosis to maintain normal tissue homeostasis [7]. Although death receptors of the TNF superfamily members are potential targets for anti-cancer drugs, toxic side effects have been observed that place limits on their therapeutic use [4]. TNF and Fas (TNFRSF6) were also found to activate nonspecific TNF receptors resulting in extensive ischemic and hemorrhagic lesions in several tissues leading to septic shock and fulminating hepatic failure in animal models [34]. A more promising approach involves targeting the TRAIL (TNF-Related Apoptosis Inducing Ligand) receptors [4,7,34]. Activation of the TRAIL death receptor can kill cancerous cells but not normal cells, whereas monoclonal antibodies against TRAIL and recombinant TRAIL ligand can cause TRAIL resistance [36]. Thus, administration of such a drug might cause tumors to develop resistance, or cause the death of normal cells. BID, a pro-apoptotic Bcl-2 family member, provides crosstalk and integration between the death-receptor and mitochondrial pathways [8,9]. Targeting BID, however, is rarely discussed, although such work has potential for use in combined therapy [34]. Common pathway: CASP3 and CASP9 Caspases are the central components of the apoptotic response network. An effector caspase (e.g., caspase-3) is activated by an initiator caspase (e.g., caspase-9) and the initiator caspase is activated via other protein-protein interactions [8,9]. Targeting inhibitors of caspases could potentially cause apoptosis of cancerous cells. Synthetic activators of caspases include Apoptin and IAP [33,34]. Caspase-3 and -9 are subject to inhibition by IAPs such as Livin [9]. Like BCL-2 inhibitors, XIAP inhibitors must block protein-protein interactions. When released from mitochondria, Smac binds XIAP and inactivates it, triggering apoptosis [36]. Apoptosis regulators: TP53, MYC, CFLAR, and EGFR One of the most dramatic responses to p53 is induction of apoptosis and regulation via the intrinsic pathway [39]. Drug trials that target p53 include gene therapy involving ONYX-015 and INGN201 and antisense therapy that targets a protein controlling p53 activity by Nutlins which blocks p53/MDM2 interaction [4,34]. The proto-oncogene c-MYC encodes a transcription factor that is implicated in various cellular processes, including cell growth, proliferation, loss of differentiation, and apoptosis. The induction of cell-cycle entry sensitizes the cell to apoptosis, so that cell-proliferation and apoptotic pathways are coupled [40]. CFLAR (c-FLIP) regulates caspase-8 and FADD-like apoptosis. Whereas CFLAR blocks the activation of the initiator caspase-8, XIAP can block the initiation phase (by inhibition of caspase-9) and the execution phase (by blocking caspase-3 and caspase-7) [23]. Some agents, for example, agent ZD1839 as an EGFR (epidermal growth factor receptor) inhibitor, do not primarily target apoptosis, but indirectly modulate apoptosis [34]. Stress-induced signaling and others: MAPK1, MAPK3, CDKN1A, and CCND1 Proteins of the MAPK (mitogen-activated protein kinase) family are crucial in many signaling pathways [41]. Three MEK (MAPK kinase) inhibitors, CI-1040, PD0325901, and ARRY-142886, are currently in clinical trials for treatment of various cancers [42]. Although some drugs target MAPKs, MAPK1 (ERK or p38), and MAPK3 (ERK1) are not the main drug targets in the apoptotic pathway [34]. CDKN1A (p21) (cyclin-dependent kinase inhibitor-1) plays a role in cell cycle arrest and induction of apoptosis. The activities of cyclin D- and cyclin E-dependent kinases are linked through the Cip/Kip family of Cdk inhibitors, including p27 and p21 [43]. CCND1 is cyclin D1 in the G1/S transition of the cell cycle, and is controlled by the tumor suppressor gene RB through cdk-cyclin D complexes [44]. Prediction of additional drug targets by decreasing the degree of perturbation In addition to identifying several apoptosis drug targets that have already been identified by other studies, we applied our method to predict additional drug targets by decreasing the perturbation threshold. Thus, if we reduce the sum of degree of perturbation to 7, we predict the following additional targets: BAK1, CASP2, BCL2A1, IGF1, PRKCD, NFKB1, and PCNA. NFKB1 has both anti- and pro-apoptotic functions that are determined by the nature of the death stimulus. The drug PS11445 targets the NFKB1 inhibitor IKKβ [34] but the other proteins have not previously been considered as drug targets. Prediction of new GO annotations of the four proteins: CDKN1A, CCND, PCNA, and PRKCD If we consider all 24 proteins with sum of degree of perturbation ≥ 7, four proteins (CDKN1A, CCND1, PRKCD and PCNA) are also considered to have a role in apoptosis. If the function of any one protein in a network is known, identification of its interacting partners allows prediction of their functions [11]. However, two of these proteins (PCNA and PRKCD) are known to have a role in DNA damage-induced apoptosis. PCNA (Proliferating Cell Nuclear Antigen) is ubiquitinated and involved in RAD6-dependent DNA repair in response to DNA damage. PRKCD (protein kinase C, delta) is also associated with DNA damage-induced apoptosis [45], whereas gene ontology annotations of PRKCD do not include apoptosis (Supplementary Tables 2 and 3). Although other methods (such as identification of protein domains) allow prediction of protein-protein interactions in different organisms [46-48], these methods do not allow identification of potential drug targets. Our method provides efficient and precise prediction of anti-cancer drug targets and also specifies these target proteins with detailed Gene Ontology annotations. This will help researchers identify additional drug targets by examination of other cellular mechanisms involved in cancer. Network modeling has been used to identify genes potentially associated with breast cancer in a BRAC-centered network [27]. However, there are very few time-series microarray databases for cancerous and normal cells, so it is difficult to use our method to compare protein hubs for different types of cancer. Moreover, our method does not address the potential adverse effects of targeting a specific protein and the problem of drug delivery. We expect that more genomic time-series microarray experiments and clinical research will address these limitations in the future. Caspase activation through static and dynamic hubs Static network topology is not sufficient to define function, and incorporating time-dependent expression data is important for understanding pathway function [20]. A number of computational approaches have been proposed for prediction of protein-protein interactions, such as domain-domain interactions [46], the confidence score resulting from sequence similarity and number of edges [2], and integration of genomic data sets [49]. A nonlinear stochastic model can depict protein-protein interaction networks at different times by using linear binary interactions and nonlinear protein complex relationships. Previous studies have used linear stochastic models to describe the multiple feedback loops of p53 [50], but nonlinear effects cannot be depicted by this method. Our method also illustrates the dynamic behavior of protein-protein interaction networks, which cannot be examined by probabilistic methods of data integration [49]. Networks consist of party hubs (static hubs) and date hubs (dynamic hubs). Party hubs are found in static complexes with most of their partners present at the same time, whereas date hubs bind their interaction partners at different times or locations [12,51]. To further investigate dynamic apoptotic properties in human cells, we considered caspases as protein hubs in our dynamic nonlinear stochastic protein-protein models (Figs. 4A–D
Figures 4A–D Conclusion The cancer-perturbed protein-protein interaction networks of apoptosis that we developed here shed light on the mechanism of cancer at the systems level and allow identification of potential drug targets. In this study, we applied nonlinear stochastic modeling to describe individual and cooperative protein interactions. Our method is more precise than the linear models used in previous research. We successfully integrated microarray and proteome databases to identify cancer-perturbed protein-protein interaction networks. Our predictions of potential drug targets agreed with potential targets identified by other studies and also identified additional targets which may guide the development of new anti-cancer drugs in future. We also identified static and dynamic hubs in human protein-protein interaction networks, which have heretofore been identified only in yeast. Methods Construction of initial protein-protein interaction networks A comprehensive understanding of protein-protein interactions in an organism provides a framework for understanding biology as an integrated system, and human perturbed protein-protein interaction networks offer insight into disease mechanisms such as cancer at the systems level [5,11,25]. Before construction of cancer-perturbed protein-protein interaction networks to explore their roles in the mechanism of cancer, protein-protein interaction networks for both cancer and normal cells must be constructed for comparison. The systematic experimental mappings of the human interactome include yeast two-hybrid systems [13,14], which reveal thousands of preys and baits in the protein matrix. Several online databases such as BIND [15], HPRD [16], Intact [17], and Himap [18] provide fundamental global topologies of human protein-protein interactions. The combined use of these databases [15-18] assists precise estimation of parameters as the basis of construction of protein-protein interaction networks, because incomplete rough networks can lead to biased or possibly erroneous conclusions [11]. We first constructed an initial protein-protein interaction network using two yeast-two-hybrid experiments and four online databases. Studies of large-scale protein-protein interactions allow development of protein interaction networks, but all large-scale experiments and databases contain high rates of false-positives [52]. Previous studies have demonstrated that use of multiple functional databases allows better identification of protein-protein interactions and leads to better prediction of the function of unknown proteins [49]. Therefore, we integrated different databases and experiments to refine our network [19]. After development of our initial network, we used microarray data to remove false-positive interactions. Selecting and processing experimental data We used microarray data [26] to compare gene expression in HeLa cells and normal primary human lung fibroblasts subjected to several stresses (e.g., heat shock from 37°C to 42°C, oxidative stress with menadione or hydrogen peroxide, and endoplasmic reticulum stress with DTT (dithiothreitol) or tunicamycin). Gene expression in HeLa cells and normal fibroblasts, both of which were treated with 2.5 mM DTT, were our microarray data sources. Gene expression was recorded at 0–30 h after stress in HeLa cells and 0–36 h in normal cells [26] (Supplementary Tables 7 and 8; see 'Additional file 7' and 'Additional file 8'). Nonlinear stochastic interaction model of initial networks We used a nonlinear stochastic model to mimic the dynamic interactions among proteins to remove false-positives and refine the network. In recent years, some systems and computational biologists employ a dynamic perspective to describe biological functions, because of their dynamic nature [53-55]. We considered two types of interactions, binary protein interactions and protein complex interactions, but not DNA-protein or metabolite interactions. We denote these basal interactions as k in our model, whereas ε[t] represents stochastic molecular events, such as molecular fluctuations of the target protein. In this study, we define 'individual protein-protein interactions' as binary protein-protein interactions, and 'cooperative interactions' as the interaction of a protein complex with the target protein. x [t] denotes the expression profile of target protein x at time point t; a denotes the influence of a target protein at one time point to the target protein at the next time point; bi denotes individual or binary interaction of protein i with target protein x [t]; and cij denotes the cooperative interaction ability of protein i and protein j on the target protein x. Thus, Fig. Fig.55
Therefore, dynamic interactions between upstream proteins and their targets can be written as:
Remark: The discrete-time dynamic model in equation (1) is based on the discrete sampling of the following continuous differential model: where x[t + 1] - x[t], we get the following discrete model [59]:or where a denotes the influence of x [t] on x [t +1] and is dependent of the decay rate, λ. By iteration of equation (1), we can construct the whole protein-protein interaction network, which is interconnected through the interactions of
In equation (2), r denotes the transition rate of the sigmoid function, and M denotes the mean level of mRNA expression of the corresponding protein. With the nonlinear stochastic interaction model (equation 1), we can identify interaction parameters a, bi, cij and k using microarray data. After having identified all the interactions with the target protein, upstream proteins can be considered as target proteins, and by repeating this procedure we can identify all interactions of the initial network. If the estimated interaction parameters Identification of interactions in the initial protein-protein interaction network Before modification of the initial network, we estimated interaction parameters of the initial network by use of maximum likelihood estimation. This represents interactions of all possible protein candidates in the initial network. After further rearrangement, equation (1) can be rewritten as:
where [t] denotes the regression vector composed of many elements that represent expression levels of protein candidates in the initial network at time point t.Employing the cubic spline method [54,58] to interpolate microarray data allows us to obtain as many data points as needed. In general, the number of data points should be at least 5 times the number of parameters to be estimated. The cubic spline method allows us to obtain these values: {x [t] xi [tl] xj [tl]} for l {1 2 ... M} and i {1 2 ... N}, j {1 2 ... S}, where M denotes the number of microarray data points, N denotes the number of possible protein interactions for the target protein, and S denotes the number of protein complexes in the network. These data points are used as the basis in our regression vector [t]. Computation of equation (3) at different time points allows us to construct the following vector-form equation:
For simplicity, we represent this as:
In equation (4), random noise ε[tk] is regarded as a white Gaussian noise with zero mean and unknown variance σ2, that is, E{ν} = 0, and Σν = E{ννT} = σ2I. Next, we employed a maximum likelihood estimation method [59] to estimate θ and σ2 using regression data obtained from the microarray data of the target protein and proteins with which it interacts. Under the assumption that ν is a Gaussian noise vector with M – 1 elements, its probability density function is given as follows.
Since ν = X - Φ · θ (equation 5), we can rewrite equation (6) as:
Maximum likelihood parameter estimation involves finding θ and σ2 to maximize the likelihood function in equation (7). In order to simplify the computation, it is practical to take the logarithm of equation (7), which yields the following log-likelihood function:
In equation (8), x [tk] and [tk] are the kth elements of X and Φ respectively.Here, we expect the log-likelihood function to have the maxima at θ =
After some computational arrangements from equation (9), the estimated parameters
After obtaining estimate
We quantified the interactions of all candidate proteins by the process described above. In equation (12), the estimated parameter Modification of initial protein-protein interaction networks Although our maximum likelihood estimation method allows quantification of all possible interactions of a target protein, we still do not know the significance of the interaction and whether it can be regarded as a true interaction. Thus, we used a statistical approach that involves model validation to evaluate significance levels and refine the network. In this procedure, we employed the Akaike Information Criterion (AIC) to validate the model order or the number of model parameters of the network [59]. The Akaike Information Criterion (AIC) considers estimated residual variance and model complexity as one statistic and provides a measure of the information lost when a model is used. In other words, the AIC is an operational way of trading off the complexity of an estimated model against how well the model fits the data. AIC decreases as residual variance
where · In our results, '1' represents protein i interacting with protein j if the interaction is within P significant interactions, and '0' represents protein i not interacting with protein j if the interaction has less than P significant interactions. It must be considered that protein-protein interaction networks represent mutual binding relationships; if protein i binds to protein j, then protein j also binds to protein i. To determine mutual relationships in the high false positive yeast-two-hybrid experiments, we use the Boolean logical 'AND' to determine the interaction maps. Two proteins were considered to interact only if each protein binds to the other protein (see Supplementary Table 1). If an interaction is detected only in HeLa cells but not in normal cells, we call this a 'gain-of-function'; if the interaction is detected in normal cells but not HeLa cells, we call it a 'loss-of-function'. Table 2 shows the truth table of all 16 events in the sample space of comparisons of individual protein-protein interactions between HeLa cells and normal cells, as determined with our AIC detection algorithm. Protein complexes are only considered once, because of incomplete information from online databases. We iteratively refined interactions in the initial network using a similar procedure. Finally, this yielded a refined protein-protein interaction network for HeLa and normal cells.
Authors' contributions LHC performed simulations, evaluated the results and wrote the paper. BSC provided the topic, direction and some suggestions. Additional file 1 Supplementary Table 1. Global protein-protein interactions of apoptosis in HeLa cells and normal primary human lung fibroblasts. Click here for file(128K, xls) Additional file 2 Supplementary Table 2. Gain-of-function proteins in Figure Figure2A,2A Click here for file(111K, xls) Additional file 3 Supplementary Table 3. Loss-of-function proteins in Figure Figure2B,2B Click here for file(105K, xls) Additional file 4 Supplementary Table 4. Eighty-six BCL2-interacting proteins identified from our results, the HPRD (Human Protein Reference Database), and literature review. Click here for file(70K, xls) Additional file 5 Supplementary Table 5. Dynamic caspase protein-protein interactions in HeLa cells. Click here for file(41K, xls) Additional file 6 Supplementary Table 6. Dynamic caspase protein-protein interactions in normal human primary lung fibroblasts. Click here for file(41K, xls) Additional file 7 Supplementary Table 7. Microarray data [26] containing genomic expression profiles of HeLa cells under ER stress. Click here for file(98K, xls) Additional file 8 Supplementary Table 8. Microarray data [26] containing genomic expression profiles of normal human primary lung fibroblasts under ER stress. Click here for file(101K, xls) Additional file 9 Matlab Programs. Five Matlab programs were used in this study. These include the AIC method, protein-protein interaction networks of apoptosis under ER stress in HeLa cells and normal primary human lung fibroblasts, and dynamic protein-protein interaction networks of caspase activation under ER stress in HeLa and normal cells. Click here for file(31K, rar) Additional file 10 Codes of Figures. Eight Osprey codes used to drawing the Figures showing protein-protein interactions in this study (Figs. (Figs.1,1 Click here for file(144K, rar) Acknowledgements The study is supported by the National Science Council of Taiwan under grant No. NSC 95-2627-B-007-011. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||||||||||||||||
Nat Genet. 2005 Jun; 37(6):579-83.
[Nat Genet. 2005]Bioinformatics. 2006 Sep 15; 22(18):2291-7.
[Bioinformatics. 2006]Cell. 2000 Jan 7; 100(1):57-70.
[Cell. 2000]Nat Rev Cancer. 2005 Nov; 5(11):876-85.
[Nat Rev Cancer. 2005]Science. 2004 Oct 22; 306(5696):640-3.
[Science. 2004]Cell. 2004 Jan 23; 116(2):205-19.
[Cell. 2004]Nature. 2000 Oct 12; 407(6805):770-6.
[Nature. 2000]Nat Rev Cancer. 2005 Nov; 5(11):876-85.
[Nat Rev Cancer. 2005]Nat Rev Mol Cell Biol. 2004 Nov; 5(11):897-907.
[Nat Rev Mol Cell Biol. 2004]Cell. 2000 Jan 7; 100(1):57-70.
[Cell. 2000]Hum Mol Genet. 2005 Oct 15; 14 Spec No. 2():R171-81.
[Hum Mol Genet. 2005]Nature. 2004 Jul 1; 430(6995):88-93.
[Nature. 2004]Nature. 2005 Oct 20; 437(7062):1173-8.
[Nature. 2005]Cell. 2005 Sep 23; 122(6):957-68.
[Cell. 2005]Nucleic Acids Res. 2003 Jan 1; 31(1):248-50.
[Nucleic Acids Res. 2003]Brief Bioinform. 2005 Dec; 6(4):380-9.
[Brief Bioinform. 2005]Hum Mol Genet. 2005 Oct 15; 14 Spec No. 2():R171-81.
[Hum Mol Genet. 2005]Nat Biotechnol. 2004 Jan; 22(1):78-85.
[Nat Biotechnol. 2004]Biosystems. 2006 Feb-Mar; 83(2-3):81-90.
[Biosystems. 2006]BMC Bioinformatics. 2007 Mar 14; 8():91.
[BMC Bioinformatics. 2007]Nature. 2005 Oct 20; 437(7062):1173-8.
[Nature. 2005]Cell. 2005 Sep 23; 122(6):957-68.
[Cell. 2005]Nucleic Acids Res. 2003 Jan 1; 31(1):248-50.
[Nucleic Acids Res. 2003]Genome Res. 2003 Oct; 13(10):2363-71.
[Genome Res. 2003]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D452-5.
[Nucleic Acids Res. 2004]Nat Genet. 2007 Nov; 39(11):1338-49.
[Nat Genet. 2007]Brief Bioinform. 2006 Sep; 7(3):243-55.
[Brief Bioinform. 2006]Cell. 2000 Jan 7; 100(1):57-70.
[Cell. 2000]Genome Res. 2003 Oct; 13(10):2363-71.
[Genome Res. 2003]Nat Rev Cancer. 2005 Nov; 5(11):876-85.
[Nat Rev Cancer. 2005]Hum Mol Genet. 2005 Oct 15; 14 Spec No. 2():R171-81.
[Hum Mol Genet. 2005]Biosystems. 2006 Feb-Mar; 83(2-3):81-90.
[Biosystems. 2006]Cell. 2004 Jan 23; 116(2):205-19.
[Cell. 2004]Nat Rev Mol Cell Biol. 2007 May; 8(5):405-13.
[Nat Rev Mol Cell Biol. 2007]Nature. 2000 Oct 12; 407(6805):770-6.
[Nature. 2000]Nat Rev Mol Cell Biol. 2004 Nov; 5(11):897-907.
[Nat Rev Mol Cell Biol. 2004]Nat Rev Cancer. 2002 Sep; 2(9):647-56.
[Nat Rev Cancer. 2002]Nature. 2000 Oct 12; 407(6805):770-6.
[Nature. 2000]Nat Rev Mol Cell Biol. 2004 Nov; 5(11):897-907.
[Nat Rev Mol Cell Biol. 2004]Hum Mol Genet. 2005 Oct 15; 14 Spec No. 2():R171-81.
[Hum Mol Genet. 2005]Nat Rev Drug Discov. 2007 Nov; 6(11):871-80.
[Nat Rev Drug Discov. 2007]Science. 2004 Oct 22; 306(5696):640-3.
[Science. 2004]Nat Biotechnol. 2004 Oct; 22(10):1215-7.
[Nat Biotechnol. 2004]Hum Mol Genet. 2005 Oct 15; 14 Spec No. 2():R171-81.
[Hum Mol Genet. 2005]Nature. 2004 Jul 1; 430(6995):88-93.
[Nature. 2004]Nat Rev Genet. 2004 Feb; 5(2):101-13.
[Nat Rev Genet. 2004]Nat Rev Drug Discov. 2007 Mar; 6(3):202-10.
[Nat Rev Drug Discov. 2007]Nat Rev Drug Discov. 2005 May; 4(5):399-409.
[Nat Rev Drug Discov. 2005]Nat Rev Cancer. 2002 Sep; 2(9):647-56.
[Nat Rev Cancer. 2002]Science. 2004 Oct 22; 306(5696):640-3.
[Science. 2004]Nat Rev Cancer. 2005 Nov; 5(11):876-85.
[Nat Rev Cancer. 2005]CA Cancer J Clin. 2005 May-Jun; 55(3):178-94.
[CA Cancer J Clin. 2005]Nature. 2005 Jun 2; 435(7042):677-81.
[Nature. 2005]Nat Biotechnol. 2005 Apr; 23(4):409-11.
[Nat Biotechnol. 2005]Nat Rev Drug Discov. 2002 Feb; 1(2):111-21.
[Nat Rev Drug Discov. 2002]Cell. 2004 Jan 23; 116(2):205-19.
[Cell. 2004]Nat Rev Cancer. 2005 Nov; 5(11):876-85.
[Nat Rev Cancer. 2005]CA Cancer J Clin. 2005 May-Jun; 55(3):178-94.
[CA Cancer J Clin. 2005]Nat Rev Cancer. 2005 Nov; 5(11):876-85.
[Nat Rev Cancer. 2005]Cell. 2004 Jan 23; 116(2):205-19.
[Cell. 2004]CA Cancer J Clin. 2005 May-Jun; 55(3):178-94.
[CA Cancer J Clin. 2005]Nat Biotechnol. 2005 Apr; 23(4):409-11.
[Nat Biotechnol. 2005]Nature. 2000 Oct 12; 407(6805):770-6.
[Nature. 2000]Nature. 2000 Oct 12; 407(6805):770-6.
[Nature. 2000]Nat Rev Mol Cell Biol. 2004 Nov; 5(11):897-907.
[Nat Rev Mol Cell Biol. 2004]Nat Rev Drug Discov. 2005 May; 4(5):399-409.
[Nat Rev Drug Discov. 2005]CA Cancer J Clin. 2005 May-Jun; 55(3):178-94.
[CA Cancer J Clin. 2005]Nat Biotechnol. 2005 Apr; 23(4):409-11.
[Nat Biotechnol. 2005]Nat Rev Mol Cell Biol. 2007 Apr; 8(4):275-83.
[Nat Rev Mol Cell Biol. 2007]Nat Rev Cancer. 2005 Nov; 5(11):876-85.
[Nat Rev Cancer. 2005]CA Cancer J Clin. 2005 May-Jun; 55(3):178-94.
[CA Cancer J Clin. 2005]Nat Rev Cancer. 2002 Oct; 2(10):764-76.
[Nat Rev Cancer. 2002]Nat Rev Mol Cell Biol. 2007 May; 8(5):405-13.
[Nat Rev Mol Cell Biol. 2007]Oncogene. 2004 Apr 12; 23(16):2838-49.
[Oncogene. 2004]Nat Rev Cancer. 2004 Dec; 4(12):937-47.
[Nat Rev Cancer. 2004]CA Cancer J Clin. 2005 May-Jun; 55(3):178-94.
[CA Cancer J Clin. 2005]Cancer Cell. 2002 Aug; 2(2):103-12.
[Cancer Cell. 2002]CA Cancer J Clin. 2005 May-Jun; 55(3):178-94.
[CA Cancer J Clin. 2005]Hum Mol Genet. 2005 Oct 15; 14 Spec No. 2():R171-81.
[Hum Mol Genet. 2005]J Cell Mol Med. 2003 Oct-Dec; 7(4):341-50.
[J Cell Mol Med. 2003]Bioinformatics. 2005 Aug 1; 21(15):3279-85.
[Bioinformatics. 2005]Bioinformatics. 2005 Jan 15; 21(2):218-26.
[Bioinformatics. 2005]Nat Genet. 2007 Nov; 39(11):1338-49.
[Nat Genet. 2007]Nat Biotechnol. 2004 Jan; 22(1):78-85.
[Nat Biotechnol. 2004]Bioinformatics. 2005 Aug 1; 21(15):3279-85.
[Bioinformatics. 2005]Bioinformatics. 2006 Sep 15; 22(18):2291-7.
[Bioinformatics. 2006]Brief Bioinform. 2005 Mar; 6(1):34-43.
[Brief Bioinform. 2005]Nature. 2004 Jul 1; 430(6995):88-93.
[Nature. 2004]Genome Biol. 2006; 7(6):R45.
[Genome Biol. 2006]Genome Biol. 2006; 7(6):R45.
[Genome Biol. 2006]Science. 2004 Oct 22; 306(5696):640-3.
[Science. 2004]Hum Mol Genet. 2005 Oct 15; 14 Spec No. 2():R171-81.
[Hum Mol Genet. 2005]Nature. 2005 Oct 20; 437(7062):1173-8.
[Nature. 2005]Cell. 2005 Sep 23; 122(6):957-68.
[Cell. 2005]Nucleic Acids Res. 2003 Jan 1; 31(1):248-50.
[Nucleic Acids Res. 2003]Nat Genet. 2006 Mar; 38(3):285-93.
[Nat Genet. 2006]Brief Bioinform. 2005 Mar; 6(1):34-43.
[Brief Bioinform. 2005]Brief Bioinform. 2005 Dec; 6(4):380-9.
[Brief Bioinform. 2005]Mol Biol Cell. 2004 May; 15(5):2361-74.
[Mol Biol Cell. 2004]Mech Ageing Dev. 2003 Jan; 124(1):9-16.
[Mech Ageing Dev. 2003]BMC Bioinformatics. 2005 Oct 18; 6():258.
[BMC Bioinformatics. 2005]Bioinformatics. 2006 Sep 15; 22(18):2276-82.
[Bioinformatics. 2006]Bioinformatics. 2006 Sep 15; 22(18):2276-82.
[Bioinformatics. 2006]BMC Bioinformatics. 2006 Feb 2; 7():52.
[BMC Bioinformatics. 2006]Bioinformatics. 2004 Aug 12; 20(12):1914-27.
[Bioinformatics. 2004]Bioinformatics. 2006 Sep 15; 22(18):2276-82.
[Bioinformatics. 2006]Genome Biol. 2003; 4(3):R22.
[Genome Biol. 2003]Mol Biol Cell. 2004 May; 15(5):2361-74.
[Mol Biol Cell. 2004]Mol Biol Cell. 2004 May; 15(5):2361-74.
[Mol Biol Cell. 2004]