![]() | ![]() |
Formats:
|
||||||||||||||||
Copyright © 2007, EMBO and Nature Publishing Group Reconstructing dynamic regulatory maps 1Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA 2Department of Molecular Biology, Hebrew University Medical School, Jerusalem, Israel 3Whitehead Institute for Biomedical Research, Nine Cambridge Center, Cambridge, MA, USA 4Department of Computer Science, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA aMachine Learning Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213, USA. Tel.: +1 412 268 8595; Fax: +1 412 268 3431; Email: zivbj/at/cs.cmu.edu *Present address: Bristol-Myers Squibb Pharmaceutical Research Institute, Princeton, NJ 08543, USA Received August 17, 2006; Accepted November 15, 2006. This article has been cited by other articles in PMC.Abstract Even simple organisms have the ability to respond to internal and external stimuli. This response is carried out by a dynamic network of protein–DNA interactions that allows the specific regulation of genes needed for the response. We have developed a novel computational method that uses an input–output hidden Markov model to model these regulatory networks while taking into account their dynamic nature. Our method works by identifying bifurcation points, places in the time series where the expression of a subset of genes diverges from the rest of the genes. These points are annotated with the transcription factors regulating these transitions resulting in a unified temporal map. Applying our method to study yeast response to stress, we derive dynamic models that are able to recover many of the known aspects of these responses. Predictions made by our method have been experimentally validated leading to new roles for Ino4 and Gcn4 in controlling yeast response to stress. The temporal cascade of factors reveals common pathways and highlights differences between master and secondary factors in the utilization of network motifs and in condition-specific regulation. Keywords: dynamics, hidden Markov models, regulatory networks Introduction Understanding the dynamic programs that a cell utilizes in response to internal or external stimuli is an important challenge. These programs activate regulatory networks controlled by several transcription factors (TFs) (Harbison et al, 2004) and can involve a large number of genes (Natarajan et al, 2001). Direct information about this process has been obtained from genome-wide chromatin immunoprecipitation (ChIP-chip) experiments and comparative motif studies that have been carried out to identify some of the regulators involved (Hahn et al, 2004; Harbison et al, 2004; Xie et al, 2005; Workman et al, 2006). Time-series microarray expression experiments are a complementary source of data, providing dynamic information about the expression of thousands of genes that are activated or repressed in response to stimuli such as environmental stress (Gasch et al, 2000). An extensive literature has accumulated on methods to analyze and model time-series gene expression data. Some of these methods have focused on determining a continuous representation of the time-series expression data using splines (D'haeseleer et al, 1999; Bar-Joseph et al, 2003a). Other methods have focused on clustering genes while taking into account gene expression dynamics using methods such as auto-regressive equations (Ramoni et al, 2002), hidden Markov models (HMMs) (Schliep et al, 2003), and template-based methods (Ernst et al, 2005). Others have modeled time-series gene expression data using techniques such as differential equations (Chen et al, 1999), dynamic bayesian networks (Kim et al, 2003), and singular value decomposition (Holter et al, 2001). These methods, although useful, only provide a partial view of the transcriptional regulation process as they do not explicitly integrate information about TF–gene interactions such as ChIP-chip and motif data. Most methods that integrate gene expression data with motif or ChIP-chip data do so without explicitly taking into account the dynamic nature of biological systems. A number of these methods combined a large number of expression data sets and motif data to infer transcription modules (Pilpel et al, 2001; Ihmels et al, 2002; Segal et al, 2003). Transcriptional modules are subsets of TFs and genes, such that genes in the same module tend to be similarly expressed and regulated by the same TFs across a number of experimental conditions. Bar-Joseph et al (2003b) integrated ChIP-chip data with expression data with a similar objective. Das et al (2006) presented a method that combined human expression data and motif information to identify active motifs, combinations of motifs and target genes under certain conditions. Although these prior methods provided important insights and often used time-series expression data sets, they did not take advantage of the sequential ordering of time points in an expression experiment, essentially treating time-series and static expression data in the same way. A few recent methods have been proposed to integrate time-series expression data with ChIP-chip or motif data while taking into account the ordering of experiments in time-series data sets. For instance, time-series expression data were used to determine which genes were active at certain phases and then combined with ChIP-chip data using a trace-back algorithm to identify active TFs at these phases (Luscombe et al, 2004). This method in effect identified an ordered series of static regulatory graphs, but its direct connection with the dynamics of observed gene expression patterns is less clear. Other methods have relied more heavily on individual gene expression profile dynamics. For instance, Kundaje et al (2005) forms independent clusters of genes by using a joint probabilistic model for the dynamics of time-series expression profiles of genes and the motifs in their promoter regions. Others have integrated time-series expression data with ChIP-chip data to model the expression of individual genes (Lin et al, 2005) and interactions among TFs (Cokus et al, 2006) applying their techniques to model the cell cycle. Another method (Bonneau et al, 2006) used kinetic equations based on the time-series expression data to associate TFs with subsets of genes across a subset of experimental conditions. Our objective is different from that of these prior works. We present a computational method that integrates the time-series expression data and ChIP-chip or motif information to infer an annotated global temporal map. This map describes the main transcriptional regulatory events leading to the observed time-series expression patterns and the factors controlling these events during a cell's response to stimuli. Our method focuses on bifurcation events. Bifurcation events occur when sets of genes that have roughly the same expression level up until some time point diverge (see Figure 1
We applied our method to study several stress responses in yeast. Our method was able to automatically infer many aspects of the temporal responses, some of which were previously known whereas others were new predictions. These new predictions range from low-level predictions regarding the timing of specific interactions to mechanistic predictions about the set of TFs controlling recovery from stress to predictions related to phenotypic changes. We have experimentally validated all types of these predictions leading to new roles for TFs in controlling yeast response to stress. We also used our temporal maps to compare different stress experiments and to identify a number of common control mechanisms. By using the time of activation that our method assigned to TFs, we were able to identify cascades of activators. Analysis of these cascades provides insights into the utilization of networks motifs and condition-specific regulation in response to stress. Results The Dynamic Regulatory Events Miner (DREM) To combine time-series gene expression data and ChIP-chip or motif data, we extended an algorithm for learning IOHMMs (Bengio and Frasconi, 1995). IOHMMs are an extension of HMMs. HMMs have been used in the past to model sequential data including DNA and protein sequence data (Durbin et al, 1998) and to cluster gene expression data (Schliep et al, 2003). In HMMs as well as IOHMMs, hidden states are used to group genes by associating a cluster with each path through the hidden states over time. In our application, each hidden state is associated with one time point and represents a Gaussian distribution of the expression values for genes associated with it (Figure 1 We constrain the transitions to enforce a tree structure among the hidden states. This allows us to model bifurcation events, places in the time series where subsets of genes that had similar expression values at prior time points diverge from each other. The identification of these events using an IOHMM is biased to those splits for which different sets of TFs regulate the divergent sets of genes. The algorithm searches over many possible treemap structures, training the parameters associated with the structures, and then scoring these structures using cross-validation to select the best one. The resulting set of hidden states and transitions between them leads to a global dynamic map. Each gene is then assigned to a specific path in the map based on its time-series expression data and ChIP-chip or motif data. Following this assignment, we compute association scores for TFs and splits using the hypergeometric distribution enrichment calculation. In the supplement, we discuss how to relax the tree constraints to allow paths to converge during recovery periods making the model more realistic (Supplementary Figure 14). For the results in the main text, we only used the sampled time points; however, as we show in Supplementary information, one can also use our model with interpolated values for time points that were not sampled (Supplementary Figure 15). Finally, we note that for the results in this paper, we limited the analysis to binary splits although the method also generalizes to higher order splits. Complete details on the algorithm are given in the Methods section in Supplementary information. A temporal map for amino-acid starvation response To test the ability of the dynamic regulatory events miner (DREM) to learn dynamic maps from time-series and ChIP-chip data, we initially focused on the amino-acid (AA) starvation response pathway in yeast. As AAs are the basic structural components of proteins, yeast response to this stress by increasing AA synthesis and decreasing AA utilization is critical for its survival. For this condition, we have detailed time-series expression data (Gasch et al, 2000) as well as ChIP-chip experiments for 34 TFs (Harbison et al, 2004) (Supplementary Table 1). The time-series expression data were filtered to remove genes that did not change substantially at any time point (see Results section in Supplementary information). Figure 2A
The set of genes assigned to the repressed path out of the first split is highly enriched in categories such as ribosome biogenesis and assembly (P-value <10−91) and ribosome (P-value <10−83) consistent with what has been observed before for AA starvation response (Gasch et al, 2000). A majority of the ribosomal genes were determined to be bound by the ribosomal TFs Rap1, Fhl1, or Sfp1. It has been previously noted that Rap1 and Fhl1 remain bound to the promoter regions of ribosomal genes under environmental stress (Wade et al, 2004; Rudra et al, 2005). An additional TF, Ifh1, which has recently been implicated in having an important role in controlling the expression of ribosomal genes under stress (Wade et al, 2004; Rudra et al, 2005) was not part of the set of TFs for which a ChIP-chip experiment was performed in AA starvation conditions. Extending condition-specific dynamic maps using general binding data Although the map derived by DREM provided explanations for several bifurcation points, others could not be explained using the limited set of TFs with ChIP-chip data under the AA starvation condition. We have hypothesized that some of these points could be explained by TFs, which were previously not known to be involved in this process and so were not originally profiled with ChIP-chip experiments in AA starvation conditions. To test this, we have employed DREM again, this time augmenting the static input data with an additional 75 TFs profiled with ChIP-chip experiments in other conditions in yeast, primarily yeast complete growing media (YPD) (Harbison et al, 2004). To reduce false positives in this ChIP-chip data, we used a post-processed version of the data that also requires the presence of an evolutionarily conserved motif (see Materials and methods). The map derived from this additional data (Figure 2B Validating interactions and mechanistic predictions The temporal map derived by DREM suggests that Ino4 is activating genes as part of a recovery mechanism several hours after AA starvation. However, Ino4 had previously only been profiled with a ChIP-chip experiment in YPD media (Harbison et al, 2004). To validate the prediction of Ino4's role in AA starvation conditions, we first carried out ChIP–PCR experiments in which we checked the in vivo association of the Ino4 protein to several of its targets. We selected four of the genes bound by Ino4 in the ChIP-chip experiment in YPD media with a P-value <0.005 that were assigned to the path most significantly controlled by Ino4 (brown path in Figure 2B
Temporal maps for the regulation of stress response in yeast We next combined condition- and non-condition-specific binding data and used DREM to construct temporal regulatory maps for a number of other stress conditions (Figure 4A
For example, for heat shock, Hsf1 was identified as a ‘master' regulator controlling the initial activation response, which is consistent with previous studies (Bonner et al, 1992). Msn2- and Msn4-regulated genes were also over-represented on the highest expressed paths of the heat-shock model. The set of genes assigned to the path that was still increasing at 10 min was overenriched for GO categories such as carbohydrate metabolism (P-value <10−14), response to stress (P-value <3 × 10−9), and protein folding (P-value <6 × 10−7). All 15 of the protein folding genes assigned to this path were also bound by Hsf1. For peroxide, Yap1 and Skn7 were two of the TFs correctly assigned to activated paths in the first bifurcation point, consistent with a previous report (Lee et al, 1999). Other TFs associated with regulating initially activated genes include Yap7, Rpn4, Msn2, Msn4, Gcn4, Aft2, and Put3. The genes on the initially activated path were enriched for GO categories such as aldehyde metabolism (P-value <2 × 10−9) and response to oxidative stress (P-value <6 × 10−9). Along the repressed paths are cell-cycle TFs Mcm1, Sum1, Swi5, and Swi6 and ribosomal TFs Fhl1 and Rap1, all of which were detected without hydrogen peroxide binding data (Supplementary Tables 6 and 7). Using these temporal stress response maps, we looked for common mechanisms employed by yeast in response to stress. Repressed pathways While the identity of many of the activators varied depending on the stress condition, we observed two pathways that were generally repressed under the stress conditions we looked at. The first pathway showed a similar pattern of repression and recovery in all of the reconstructed temporal regulatory maps except for the map for cold shock. This pathway included the ribosomal genes and their primary TFs (Rap1, Sfp1, and Fhl1). These genes are repressed steeply and quickly. However, these genes also recovered quickly approaching their pre-treatment levels. In cold shock, the ribosomal genes were assigned to an activated path consistent with a previously made observation (Gasch et al, 2000; Supplementary Figures 6 and 7). Another common repressed pathway that was observed in AA starvation, heat shock, and cold shock was a pathway controlled by Swi4, Swi6, and Mbp1. This pathway primarily contained cell-cycle genes. For example, in the heat-shock pathway, there was a particularly strong enrichment for G1 cell-cycle genes (Spellman et al, 1998) (P-value <4 × 10−20). In comparison to the ribosomal genes, cell-cycle genes were repressed at a slower rate and to a less significant level (Supplementary Figure 4). However, when they recover, they were expressed at a higher level than their initial (time point 0) value (Figure 4B Master regulators and condition-specific regulation in response to stress Although the identity of the activators varied, a few activators were identified to have a much more significant control of the initial change in expression levels in response to AA starvation (Gcn4, Cbf1, Rap1, Fhl1, and Sfp1) and heat shock (Hsf1, Msn2/4, and Skn7). This can be seen most clearly in the AA map where the majority of the activators in these conditions were activating genes in later time points. This type of response can result from cells that constitutively express a small number of master regulators so that they can react quickly to stress whereas the later TFs are only expressed as part of the response process. This is consistent with previous studies. For example, Msn2 and Msn4 are regulated at the level of nuclear exclusion (Gorner et al, 1998), Cbf1 also already exists in the cell before starvation and its transcript level is not affected by methionine starvation (Mellor et al, 1990), and Gcn4 and Hsf1 are found in association to a subset of their target gene promoters even before stress (Hahn et al, 2004; Harbison et al, 2004). These differences are also apparent in the initial expression levels of the regulators. As Figure 5C
To further study this point, we looked at the condition-specific regulation activities of TFs in AA starvation and heat shock by dividing the TFs into two groups: the first contained TFs that were determined to control the initial response and the second contained the rest of the TFs. We compared the binding targets of these TFs under YPD media and in the condition that they regulate. As part of the response to stress, several yeast TFs begin to regulate genes that were not regulated by them in YPD media (Bar-Joseph et al, 2003b; Luscombe et al, 2004). As can be seen in Figure 5A In addition to differences in the condition-specific activity between master and secondary regulators, we have also observed differences in the utilization of different network motifs. As Supplementary Figures 9 and 10 show, genes bound by master regulators in a feed forward loop (FFL) displayed consistently higher expression levels when compared to genes regulated by the same TFs in a multiple input (MI) or single input (SI) motifs. In contrast, for many secondary TFs, we have not observed a large difference between the expression levels of FFL-controlled genes and genes controlled by the other two network motifs. The ability of master regulators to utilize FFLs by consistently expressing some of the genes at a higher level during a response may help cells fine-tune their response to various stresses. Indeed, whereas 45% of the genes bound by Gcn4 in an FFL are known AA biosynthesis genes (based on GO), only 21 and 13% of the genes bound in an MI or an SI, respectively, are assigned to this category. Thus, although many genes are activated initially, only a few of them will remain expressed at a high level in a later point as their expression requires the additional binding of a secondary factor. In this way, an FFL serves as a filtering motif, removing signals that were erroneously activated and maintaining those that are required for the actual response pathway (Mangan and Alon, 2003). Determining the activation time of regulators As the results above indicate, most TFs that are activated during stress either change or expand the set of genes they regulate. To identify these new sets, ChIP-chip experiments are often used to determine the roles of several TFs in various response pathways (Bar-Joseph et al, 2003b; Harbison et al, 2004; Workman et al, 2006). However, even when TFs are known, or suspected, to be involved in such pathway, the actual time in which they are activated may vary. Master regulators are activated early on, whereas secondary regulators are activated later. The ability of DREM to determine a time point for carrying out such experiment can help in accurately recovering the role a factor plays in a response pathway. To study this, we have looked at the activation of Gcn4 as part of the response to methyl-methanesulfonate (MMS) stress in yeast. In a previous study, it was determined, using an experiment 60 min after the induction of stress, that Gcn4 did not expand the set of genes it regulates when compared to the set it regulates in YPD media (Workman et al, 2006). We used DREM to reconstruct a dynamic map for this system (Supplementary Figure 8). The DREM map inferred for this condition made two predictions about Gcn4: (1) that Gcn4 was expanding the set of genes it regulates at the 15 min time point when compared to YPD media and (2) that Gcn4 binding would likely be less intense at 60 min as the expression of the main Gcn4-controlled paths decreased at that time point. To test whether the temporal predictions of DREM were correct, we carried out genome-wide binding experiments at three time points: 0 (YPD media), 15, and 60 min. As predicted by DREM, we found a large expansion in the set of genes regulated by Gcn4 at the 15 min time point. Whereas 45 genes were bound by Gcn4 in YPD media (using a 0.005 P-value cutoff), 235 genes were bound at the 15 min time point and a smaller number (212 genes) at the 60 min time point (Supplementary Table 8). In addition, for the vast majority of Gcn4-bound genes, the binding P-value was more significant at the 15 min time point when compared to the 60 min point, indicating that Gcn4 is indeed more active earlier in the response as predicted by DREM (Figure 5D Verifying the advantage of integrating time-series expression and ChIP-chip data To verify the advantage gained from integrating time-series expression data and ChIP-chip data, we tested whether either data alone could have reproduced results similar to those obtained when combining them. First, we generated a randomized version of the AA ChIP-chip data by randomizing the genes each TF was bound to while holding the number of genes bound by each TF fixed. We then applied DREM to this randomized ChIP-chip data and the original AA gene expression data. This procedure resulted in maps that had no, or very few, TF labels (Supplementary Figure 11). Specifically, we found that activators that were determined to be ‘master' regulators using the real binding data were not assigned to the first split using the randomized values and most of the AA biosynthesis regulators were not assigned to any of the splits. Second, we applied an HMM model to the time-series data without using ChIP-chip while still enforcing the same tree structure requirements on the hidden states. To compare the HMM model with the IOHMM model of Figure 2B Discussion The recent availability of expression and ChIP-chip and motif data has led to a number of efforts aimed at reconstructing regulatory networks. To date, these efforts primarily focused on determining a static graph representation of the underlying network. These efforts have led to many insights regarding the overall organization of networks (Barabasi and Oltvai, 2004), network motifs (Milo et al, 2002), and the set of interactions in various biological systems (Pilpel et al, 2001; Ihmels et al, 2002; Bar-Joseph et al, 2003b; Segal et al, 2003; Workman et al, 2006). The computational method we presented in this paper, DREM, takes a different approach providing a global dynamic view of the gene regulation. This approach has a number of advantages when compared to methods that derive static graphs. First, biological systems are dynamic. TFs may bind different genes at different time points. Thus, the ability of DREM to derive dynamic maps that associate TFs with the genes they regulate and their activation time points may lead to better insights regarding the system being studied. These insights may include the identification of master regulators that control the initial response and secondary regulators that are responsible for more specific pathways. It may also help explain several aspects of the observed response including the condition-specific activity of factors and the activation of certain network motifs. As timing is available in these maps, some of the paths and the factors regulating them may be linked to predictions regarding the timing of specific phenotypes. Second, many TFs are post-transcriptionally regulated. For these TFs, it is hard to determine an activation time when using only their expression data. When studying biological systems using ChIP-chip methods, researchers rely on previous knowledge and other data sources to determine which factors to profile under the condition of interest (Bar-Joseph et al, 2003b; Hahn et al, 2004; Harbison et al, 2004; Workman et al, 2006). DREM's ability to use general motif data or ChIP-chip data from other experimental conditions for deriving temporal regulatory maps presents a useful complementary approach for determining TFs on which to study with a ChIP-chip experiment. As we have shown for Ino4, these predictions may lead to new regulatory roles for some of the factors. Importantly, for these factors, DREM also indicates a time point at which these ChIP-chip experiment should be carried out. Determining the right point leads to a more accurate set of regulators as we have shown with Gcn4 in MMS. Finally, we note that although we presented DREM in the context of analyzing a large number of stress-response data sets, it can be applied equally well to study a single condition of interest. The accuracy of the models generated by DREM and their predictive power is another indicator to the importance of data integration. As was noted in the past (Jansen et al, 2003), each data source provides only a partial view of the activity in the cell. By integrating diverse data sets, we can improve over the results obtained by each datum on its own. For example, in the context of clustering expression data, a key question is the number of clusters to use. Another problem relates to noise and the small number of time points measured. DREM addresses these problems by integrating ChIP-chip or motif data with the time-series data. This leads to more natural derivations of clusters based on bifurcation points and improves the resulting clusters as we showed using GO (Supplementary Figure 13). Like any other computational method, DREM is highly dependent on the input data. DREM relies on the availability of high-quality time-series expression data. Here, the sampling rate may play an important role in the ability of DREM to derive accurate regulatory maps. For example, although we observed initial activation by a few master regulators in a number of different conditions, the initial response to peroxide was determined to be controlled by nine TFs. This may be the result of the sampling rate. If this rate is too low, regulatory effects may be aggregated in some of the time points, preventing DREM from associating TFs with their correct activation time. Determining the appropriate sampling rate is an important problem (Simon et al, 2005). In some cases, DREM can be used to identify a problem with the sampling rate that has been used and to suggest places in which more samples are needed. However, even in cases in which an experiment is well sampled, TF labels can aggregate at earlier time points, as assignments of genes to paths is based on all time points. The models derived by DREM are currently limited to tree structures with the option to also model convergence of paths from a common split. Although this is motivated by previous biological observations (Balázsi et al, 2005), in some cases it may be more natural to allow other types of path merges and resplits. DREM also does not explicitly model regulation through other mechanisms, such as chromosome remodeling and mRNA degradation. Transition probabilities in DREM are computed using logistic regression, which does not capture all types of combinatorial interactions. Another limitation of DREM is that the output dynamic map model is sometimes chosen from a number of possible dynamic maps with similar scores. However, when this happens, these different maps usually share most of the important splits. Although DREM was applied to learn networks in yeast, the growing availability and diversity of ChIP-chip (Cam et al, 2004), motif (Xie et al, 2005), and time-series expression data make it possible to use DREM for many different species such as human, mouse, and flies, among others. The ability to derive dynamic networks from such data may lead to new insights, predictions, and ultimately better understandting of many biological systems. Materials and methods Pre-processing input data We first generated a binary matrix of predictions of TF–gene regulatory interactions. For predictions based on condition-specific ChIP-chip data, a ‘1' was encoded if the binding P-value of the TF to the gene's promoter region was <0.005, otherwise a ‘0' was encoded. For TFs without condition-specific ChIP-chip data, we followed a version of the regulatory code of Harbison et al (2004). For these TFs, a TF–gene pair had a ‘1' encoded in the matrix if the TF bound the promoter of the gene with a P-value <0.005 in at least one ChIP-chip experiment and there was a motif in its promoter that was evolutionarily conserved in at least two other yeast species. If no ChIP-chip data were available for a gene, then a ‘0' was encoded for all entries. All time-series data were transformed to start at ‘0' so that the value at each time point represents the log ratio change from an unstressed control. Genes were filtered if there was more than one missing value or if the gene did not change sufficiently at any time point (see Resutls section in Supplementary information). Dynamic regulatory events miner algorithm Each state of the probabilistic model is associated with a Gaussian distribution. A tree structure was used among the states and their transitions. At time point 0, there is one state, which is the root of tree. Every state except those associated with the last time point has at least one child, and for the results in this paper we allowed not more than two children. Any state having more than one child has a logistic regression classifier with L1 loss penalty (Krishnapuram et al, 2005) associated with it. This classifier maps the set of predicted TF interactions for a gene to a probability distribution of transitions to each of the child states. To learn a dynamic regulatory map, the DREM algorithm first performs a search over tree structures. A randomly selected subset of genes is used to train the Gaussian distribution parameters and the classifiers in the tree structure under consideration. The remaining genes are used to score the various tree structures considered. Training is carried out using a version of the Baum–Welch algorithm (Durbin et al, 1998). After the best scoring structure is found using the test set of genes, weakly supported splits are pruned to avoid overfitting the test set of genes. After a final model structure is selected, all genes are used to train the parameters of the final model. See Methods section in Supplementary information for full details. DREM software: The DREM software is available for download at http://www.sb.cs.cmu.edu/drem. Inferring gene assignments and TF scores Genes are assigned to their most likely path through the model using the Viterbi algorithm (Durbin et al, 1998). The assignment of genes to paths through the models is used to determine if certain paths are overenriched for genes regulated by certain TFs. Overenrichment scores are used for the association of TFs with paths. These scores are obtained using the hypergeometric distribution, with a lower score meaning a stronger association. The base set of genes for the hypergeometric distribution can be just the genes going into the previous split giving a TF split association score, or all genes on the microarray giving an overall association score of a TF for a path. GO P-values GO P-values were computed in the DREM software based on the hyper-geometric distribution. All P-values reported are uncorrected, but are still significant at the 0.01 level when correcting for multiple hypothesis testing using a randomization procedure (Ernst and Bar-Joseph, 2006). Saccharomyces cerevisiae strain list For the immunoprecipitation experiments, we used Myc-tagged W303 yeast strain obtained as a gift from Rick Young. The genotype of the Ino4 strain is MATa:ade2-1:trp1-1:can1-100:leu2-3,112:his3-11,15:ura3:GAL+:psi+:INO4::9myc:TRP1 and of the Gcn4 strain is MATa:ade2-1:trp1-1:can1-100:leu2-3,112:his3-11,15:ura3:GAL+:psi+:GCN4::9myc:TRP1. Growth condition For the AA starvation experiments, cells were grown in complete minimal medium (SCD) to early-log phase. Cells were collected by centrifugation and resuspended in an equal volume of minimal medium lacking AAs and adenine (YNB−AA, 2% glucose, 20 mg/l uracil) and allowed to grow. Samples for location analysis were taken before resuspension in AA starvation conditions and 4 h afterwards. For the MMS experiments, cells were grown in YPD media to early-log phase at 30°C until the culture reached an OD600 of 0.8–1.0. MMS (Sigma) was added to a final concentration of 0.03%, and the culture was grown for an additional hour. Samples for genome-wide location analysis were taken before adding MMS and 15 and 60 min after adding MMS. Chromatin immunoprecipitation–PCR Bound proteins were formaldehyde-crosslinked to DNA in vivo, followed by cell lysis and sonication to shear DNA. Crosslinked material was immunoprecipitated with an anti-myc antibody, followed by reversal of the crosslinks to separate DNA from protein (Aparicio 1999; Orlando, 2000). Enrichment for Ino4-binding site was measured by semiquantitative PCR using primers designed for the detection of upstream regions of the genes YDR497C, YNL169C, YGR196C, and YHR123W. Primer sequences are as follows: YDR497C: TAGCGCACCAAACTGAAAGA, AAGCGCATATACTTAGTTCTCTCCA; YNL169C: CGACCAAGAAGGATTTGAGC, CCAGCACCTTTTTGGTGTTT; YGR196C: CGCTTTCCAGAAAAAGGGTA, CGTCGTTTGTTTGTTTGGTG; YHR123W: TGGCAAAATACAGAACACAGG, TATGCTCAGTCCAGCCCTTT. As a negative control, primers for the upstream region of Cts1 were used (AGTGGTTGGTTGGTGGGAATA; TCTTTGACCAATGCCTATGAA). The quantization of the enrichment was performed by calculating the ratio of the IP signal and the input signal for the target gene divided by the IP ratio and the input ratio of the negative control gene (Cts1), utilizing the software TINA. TINA is software for quantification of band intensity. After PCR, the fragments were separated on agarose gel (1%) and monitored by a CCD camera. Bands intensities were quantified using TINA 2.09d quantification software (Raytest, http://www.raytest.de). Chip on chip Genome-wide location analysis was performed as described previously (Ren et al, 2000). Briefly, following purification of the DNA in the ChIP procedure, immunoprecipitated DNA and DNA from an unenriched sample were amplified and differentially fluorescently labeled by ligation-mediated PCR. These samples were hybridized to a microarray consisting of spotted PCR products representing the intergenic regions of the S. cerevisiae genome. The data have been deposited into ArrayExpress with the accession numbers E-MEXP-905 (Gcn4 experiments) and E-MEXP-906 (Ino4 experiments). Budding index calculation Cells grown continuously at 30°C were collected by centrifugation, resuspended in an equal volume of 37°C medium, and returned to 37°C for growth. Samples were collected at time points as described in Figure 4C Supplementary Methods Click here to view.(115K, pdf) Supplementary Results Click here to view.(4.8M, pdf) Acknowledgments We thank Zoubin Ghahramani for the useful discussions about this work. JE and ZBJ acknowledge funding through NIH grant NO1 AI-5001 and NSF CAREER award 0448453 to ZBJ. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||
Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Mol Cell Biol. 2001 Jul; 21(13):4347-68.
[Mol Cell Biol. 2001]Mol Cell Biol. 2004 Jun; 24(12):5249-56.
[Mol Cell Biol. 2004]Nature. 2005 Mar 17; 434(7031):338-45.
[Nature. 2005]Science. 2006 May 19; 312(5776):1054-9.
[Science. 2006]Pac Symp Biocomput. 1999; ():41-52.
[Pac Symp Biocomput. 1999]Proc Natl Acad Sci U S A. 2002 Jul 9; 99(14):9121-6.
[Proc Natl Acad Sci U S A. 2002]Pac Symp Biocomput. 1999; ():29-40.
[Pac Symp Biocomput. 1999]Brief Bioinform. 2003 Sep; 4(3):228-35.
[Brief Bioinform. 2003]Proc Natl Acad Sci U S A. 2001 Feb 13; 98(4):1693-8.
[Proc Natl Acad Sci U S A. 2001]Nat Genet. 2001 Oct; 29(2):153-9.
[Nat Genet. 2001]Nat Genet. 2002 Aug; 31(4):370-7.
[Nat Genet. 2002]Mol Syst Biol. 2006; 2():2006.0029.
[Mol Syst Biol. 2006]Nature. 2004 Sep 16; 431(7006):308-12.
[Nature. 2004]IEEE/ACM Trans Comput Biol Bioinform. 2005 Jul-Sep; 2(3):194-202.
[IEEE/ACM Trans Comput Biol Bioinform. 2005]Genome Biol. 2006; 7(5):R36.
[Genome Biol. 2006]Proc Natl Acad Sci U S A. 2005 May 31; 102(22):7841-6.
[Proc Natl Acad Sci U S A. 2005]Mol Biol Cell. 2000 Dec; 11(12):4241-57.
[Mol Biol Cell. 2000]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Proc Natl Acad Sci U S A. 1983 Sep; 80(17):5374-8.
[Proc Natl Acad Sci U S A. 1983]Mol Cell Biol. 2001 Jul; 21(13):4347-68.
[Mol Cell Biol. 2001]Mol Cell Biol. 1990 Jun; 10(6):2458-67.
[Mol Cell Biol. 1990]Mol Biol Cell. 2000 Dec; 11(12):4241-57.
[Mol Biol Cell. 2000]Nature. 2004 Dec 23; 432(7020):1054-8.
[Nature. 2004]EMBO J. 2005 Feb 9; 24(3):533-42.
[EMBO J. 2005]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Mol Cell Biol. 1990 May; 10(5):2437-41.
[Mol Cell Biol. 1990]Yeast. 1997 Dec; 13(16):1505-18.
[Yeast. 1997]Mol Biol Cell. 1998 Dec; 9(12):3273-97.
[Mol Biol Cell. 1998]Curr Genet. 1988; 13(1):7-14.
[Curr Genet. 1988]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Mol Biol Cell. 2000 Dec; 11(12):4241-57.
[Mol Biol Cell. 2000]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Mol Cell Biol. 2004 Jun; 24(12):5249-56.
[Mol Cell Biol. 2004]Mol Cell Biol. 1992 Mar; 12(3):1021-30.
[Mol Cell Biol. 1992]J Biol Chem. 1999 Jun 4; 274(23):16040-6.
[J Biol Chem. 1999]Mol Biol Cell. 2000 Dec; 11(12):4241-57.
[Mol Biol Cell. 2000]Mol Biol Cell. 1998 Dec; 9(12):3273-97.
[Mol Biol Cell. 1998]Genes Dev. 1998 Feb 15; 12(4):586-97.
[Genes Dev. 1998]EMBO J. 1990 Dec; 9(12):4017-26.
[EMBO J. 1990]Mol Cell Biol. 2004 Jun; 24(12):5249-56.
[Mol Cell Biol. 2004]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Nature. 2004 Sep 16; 431(7006):308-12.
[Nature. 2004]Proc Natl Acad Sci U S A. 2003 Oct 14; 100(21):11980-5.
[Proc Natl Acad Sci U S A. 2003]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Science. 2006 May 19; 312(5776):1054-9.
[Science. 2006]Mol Cell Biol. 2001 Jul; 21(13):4347-68.
[Mol Cell Biol. 2001]Nat Rev Genet. 2004 Feb; 5(2):101-13.
[Nat Rev Genet. 2004]Science. 2002 Oct 25; 298(5594):824-7.
[Science. 2002]Nat Genet. 2001 Oct; 29(2):153-9.
[Nat Genet. 2001]Nat Genet. 2002 Aug; 31(4):370-7.
[Nat Genet. 2002]Science. 2006 May 19; 312(5776):1054-9.
[Science. 2006]Mol Cell Biol. 2004 Jun; 24(12):5249-56.
[Mol Cell Biol. 2004]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Science. 2006 May 19; 312(5776):1054-9.
[Science. 2006]Science. 2003 Oct 17; 302(5644):449-53.
[Science. 2003]Proc Natl Acad Sci U S A. 2005 May 31; 102(22):7841-6.
[Proc Natl Acad Sci U S A. 2005]Mol Cell. 2004 Nov 5; 16(3):399-411.
[Mol Cell. 2004]Nature. 2005 Mar 17; 434(7031):338-45.
[Nature. 2005]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]IEEE Trans Pattern Anal Mach Intell. 2005 Jun; 27(6):957-68.
[IEEE Trans Pattern Anal Mach Intell. 2005]BMC Bioinformatics. 2006 Apr 5; 7():191.
[BMC Bioinformatics. 2006]Trends Biochem Sci. 2000 Mar; 25(3):99-104.
[Trends Biochem Sci. 2000]Science. 2000 Dec 22; 290(5500):2306-9.
[Science. 2000]Mol Biol Cell. 1998 Dec; 9(12):3273-97.
[Mol Biol Cell. 1998]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Mol Cell Biol. 2004 Jun; 24(12):5249-56.
[Mol Cell Biol. 2004]