![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright Farina et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Embedding mRNA Stability in Correlation Analysis of Time-Series Gene Expression Data 1Dipartimento di Informatica e Sistemistica “Antonio Ruberti”, Sapienza Università di Roma, Rome, Italy 2Istituto di Biologia e Patologia Molecolari, Consiglio Nazionale delle Ricerche, Rome, Italy 3Istituto Nazionale di Ricerca per gli Alimenti e la Nutrizione, Rome, Italy Manuel Ares, Jr, Editor University of California Santa Cruz, United States of America * E-mail: lorenzo.farina/at/uniroma1.it Analyzed the data: LF ADS SS GM IR. Wrote the paper: LF ADS GM IR. Conceived methodology: LF ADS. Contributed to the development and validation of the method: GM IR. Received March 14, 2008; Accepted June 24, 2008. Abstract Current methods for the identification of putatively co-regulated genes directly from gene expression time profiles are based on the similarity of the time profile. Such association metrics, despite their central role in gene network inference and machine learning, have largely ignored the impact of dynamics or variation in mRNA stability. Here we introduce a simple, but powerful, new similarity metric called lead-lag R2 that successfully accounts for the properties of gene dynamics, including varying mRNA degradation and delays. Using yeast cell-cycle time-series gene expression data, we demonstrate that the predictive power of lead-lag R2 for the identification of co-regulated genes is significantly higher than that of standard similarity measures, thus allowing the selection of a large number of entirely new putatively co-regulated genes. Furthermore, the lead-lag metric can also be used to uncover the relationship between gene expression time-series and the dynamics of formation of multiple protein complexes. Remarkably, we found a high lead-lag R2 value among genes coding for a transient complex. Author Summary Microarrays provide snapshots of the transcriptional state of the cell at some point in time. Multiple snapshots can be taken sequentially in time, thus providing insight into the dynamics of change. Since genome-wide expression data report on the abundance of mRNA, not on the underlying activity of genes, we developed a novel method to relate the expression pattern of genes, detected in a time-series experiment, using a similarity measure that incorporates mRNA decay and called lead-lag R2. We used the lead-lag R2 similarity measure to predict the presence of common transcription factors between gene pairs using an integrated dataset consisting of 13 yeast cell-cycles. The method was benchmarked against six well-established similarity measures and obtained the best true positive rate result, around 95%. We believe that the lead-lag analysis can be successfully used also to predict the presence of a common mechanism able to modulate the degradation rate of specific transcripts. Finally, we envisage the possibility to extend our analysis to different experimental conditions and organisms, thus providing a simple off-the-shelf computational tool to support the understanding of the transcriptional and post-transcriptional regulation layer and its role in many diseases, such as cancer. Introduction Gene expression is a highly regulated process composed of two fundamental biological events: synthesis and degradation. Transcription regulation is achieved by modulating the frequency of transcription initiation and, although the most studied, this event represents just the first of the many complex stages leading to a mature mRNA. Recent experimental work is beginning to shed light on the complex architecture underlying mRNA degradation pathways by identifying the factors and enzymes involved. Therefore, it is now widely accepted that mRNA decay contribution to the control of gene expression is not simply a biological waste-disposal system, but a key player for the temporal coordination of cellular functions. Moreover, a number of highly complex and sophisticated specific mechanisms have been identified [1]. Such mechanisms include the interaction with mRNA binding proteins [2] and the nonsense-mediated mRNA decay pathway [3], both able to affect the accumulation of hundreds of transcripts. Recent technologies, such as microarrays, are able to provide measurements of mRNA abundance over time under different experimental conditions. In order to decipher the intricate regulatory network underlying the highly coordinate cell behavior, effective computational methods have been developed to take advantage of gene expression data. The basic idea underlying such methods stems from the experimental observation that genes are organized in groups showing similar time profiles [4] (called “clusters”). These groups often share some common biological features, such as the same cellular function or the presence of a common motif at their promoter regions [5] where transcription factors (TFs) can bind and possibly turn them on or off in a coordinated manner, when needed. For this reason, it is now widely accepted that co-expression is a good indication for co-regulation [6]–[8], meaning that whenever two genes display similar time profiles it is likely that they are both targets of the same transcription factor(s). The search for co-regulated genes depends on association metrics used by clustering algorithms [5],[9],[10] and gene network inference algorithms [11]–[13]. Therefore, measuring the degree of co-expression of genes is a fundamental step for data analysis, and in fact, many similarity measures have been proposed in the literature [14]. Among those available to quantitatively measure simultaneous expression, we will refer to the usual R2 value obtained from a linear regression model between two given gene expression time profiles denoted by mA(t) and mB(t). Their co-varying degree is therefore measured as the fraction of the total variance explained by the regression mA(t) = c1mB(t)+c2. Such coefficient, indicated in this paper as the simultaneous R2 of the corresponding gene pair, is the square of the Pearson correlation and takes values between 0 and 1.In order to infer the gene regulatory network, several laboratories have combined microarray data with protein-DNA interaction data, taking advantage of ChIP-on-chip experiments [15]. Such studies have shown that the same transcription factor (or combinations of) may target genes with very different expression time profiles, even in the same experimental condition. For example, the targets of the yeast cell cycle transcriptional regulators MBF/SBF display expression peak times that span from early G1 to late S. Moreover, delays have been recently observed between putatively co-regulated genes [16],[17]. One fundamental biological mechanism underlying such temporal spread is certainly combinatorial regulation of transcription factors. In fact, various TFs can modulate target response by cooperating or competing for DNA binding. Consequently, new computational techniques have recently appeared in the literature to tackle this problem [18]–[25]. However, combinatorial regulation is not the only mechanism responsible for peak time delay, as other regulation layers are active throughout transcript life and impact its abundance over time. One such additional regulation layers is certainly the post-transcriptional one, that is the stability properties of transcripts that may specifically contribute to the determination of their timing and amount during cell response to various internal and/or external signals. Strikingly, recent genome-wide measurement of the yeast transcripts half-lives [26],[27] has shown functional specificity in mRNA decay. Together, these results pointed to a general relationship between physiological function and mRNA decay rate thus providing strong evidence that precise control of mRNA turnover is a fundamental feature of gene expression programs in yeast [26] and in many other organisms. Here we focus on the development of a novel computational tool aiming to uncover co-regulated genes through transcriptional and post-transcriptional regulatory mechanisms. To this purpose, starting from the computational approach developed by Farina et al. [28], we introduce a new relationship between gene pairs, called lead-lag relationship. The term “lead-lag” has been taken from the field of control systems engineering where the same relationship holds between the input and the output of the so called “lead-lag compensator”, which is the fundamental building block for the design of automatic control systems [29]. In a biological perspective, the lead-lag relationship should be referred to genes under a common regulatory signal (“input”) involved in the same biological function (“output”) as, for example, in the dynamic multi sub-units complex formation [30],[31]. Using yeast cell-cycle time-series gene expression data, we demonstrate that this new similarity metric is able to capture the dynamics of gene expression, including varying mRNA stability and delays. Thus, the predictive power of lead-lag R2 for the identification of co-regulated genes is significantly higher than that of standard similarity measures, allowing the selection of a large number of entirely new putatively co-regulated genes. Furthermore, the lead-lag metric can also be used to uncover the relationship between gene espression time-series and the formation of protein complexes. Results/Discussion Specific Features of Transcript Degradation Regulation Versus Transcription Regulation To clarify the specific features of gene regulation at the mRNA stability level, it is worth thinking of the case when two genes are turned on at the same time by the same transcriptional signal, and the newly synthesized transcripts of both genes are degraded at the same rate. Consequently, differences in their gene expression profile will be determined only by the response of the two genes to the transcriptional signal (i.e. different affinities of the transcription factor to promoter regions). A computer simulation of this situation is depicted in Figure 1A = 1). Indeed, the “converse” situation is very different. Figure 1Ca–b
Such considerations illustrate that the impact of stability regulation on time profiles is quantitatively and – most importantly – qualitatively different from that of transcription regulation. It is therefore not surprising that specific systems biology computational tools have begun to appear in the literature [28],[32]. The different impact of mRNA stability regulation versus transcription regulation results from the fact that the rate of mRNA degradation is proportional to the substrate concentration but the rate of production is not [33]. Such behaviour is reasonably well captured by a first order rate equation. In fact, messengers half-lives are experimentally measured usually by fitting a single exponential decay function to the time profiles observed after transcriptional shut-off [26]. Another important issue is that the differences of transcription rate regulation with respect to degradation rate regulation cannot be clearly seen by simply looking at the long term behavior of the response, i.e. at steady state values. In fact, the final amount of mRNA upon a prolonged regulatory signal equals the ratio transcription rate/degradation rate so that, from this perspective, a N-fold increase of transcription rate is equivalent to a N-fold decrease in degradation rate (and viceversa). An example of such behavior can be seen by comparing Figure 1Ac Such “loss of correlation” phenomenon due to differential stability regulation can be further understood by considering a time varying rates, resulting in a transient mRNA time profile, as shown in Figure 1Ba–d and 1Da–d
The scenario depicted above naturally leads to the possibility that co-regulation may involve both the transcriptional and post-transcriptional machinery. Therefore, a large variety of temporal profiles can be obtained by combining any of those shown in Figure 2 The Lead-Lag Relationship In this paper we consider a novel relationship between gene expression time profiles which includes also the possible presence of mRNA stability variations as a further mechanism to modulate transcript abundance over time. Such new coordinated relationship will be called lead-lag relationship. Such terminology is borrowed from the field of system and control engineering where it refers to the basic building block for the realization of a regulatory device able to provide optimal properties to a given process and called “lead-lag compensator” [29]. In order to identify lead-lag relationships, we propose a quantitative measure between gene expression time profiles, called lead-lag R2, able to incorporate in a single parameter such relationship and consequently potentially enhancing the predictive power of gene expression analysis for the identification of putatively co-regulated genes. In fact, we aim to study here the possibility that an high lead-lag R2 between expression time profiles of two given genes is a good indication for the presence of a common regulation mechanism. The lead-lag R2 is quantitatively defined by a linear multiple regression model among the two given gene expression time profiles mA(t) and mB(t) and the area under curve until time t (i.e. their time integral over time):
It is worth noting that the simultaneous relationship is also a particular lead-lag relationship (just set c2 = c3 = c4 = 0) so that the magnitude of the lead-lag R2 is always larger or equal than that of the simultaneous R2. In the following we will show that the magnitude of the increase from simultaneous R2 to lead-lag R2 is specific for each gene pair and that it is statistically correlated both to the presence of a common transcriptional signal and to differences between the half-lives. More details of the lead-lag R2 and its numerical computation are given in the Materials and Methods section.Predicting Co-Regulation from Lead-Lag Relationships The mathematical model used to define the lead-lag R2 is based on the assumption that co-regulated genes have the same transcriptional signal (promoter activity) and equal or different transcript stabilities. Consequently, we postulated that two given genes showing a lead-lag relationship (namely, with high lead-lag R2 values) are likely to be regulated by common transcription factors. To test this hypothesis, we selected a list of 1159 genes indicated as cell-cycle regulated in at least one out of six yeast genome-wide studies [38]. We then used a large integrated dataset of yeast cell-cycle data generated by three independent groups using different synchronization methods and composed of 7 datasets (13 cell cycles for each gene, see Materials and Methods for details). We considered as “gold standard” the transcriptional regulatory network recently published by MacIsaac and collegues [15]. Such reconstructed network is very reliable since the authors combined complementary strategies to improve the ability to identify the specificity of transcriptional regulators from genome-wide chromatin immunoprecipitation data. The Mc Isaac et al. dataset consists of a list of targets for 203 TFs using different conservative criteria. Among those available 203 TFs, we selected a p-value for binding of 0.001 obtaining a list of 3107 genes, containing 660 of the genes in the list of the cell cycle regulated ones. We then choose the 10 TFs widely recognized as having a fundamental role during the cell cycle [39]: SWI4, SWI6, MBP1, NDD1, FKH1, FKH2, MCM1, ACE2, SWI5 and YOX1. Using this data, we could assess the effectiveness of our approach by computing true and false positive rates and ROC curves. To this end, we evaluated the lead-lag R2 for each gene pair in the dataset (N(N−1)/2 pairs, N = 660) and considered as putatively co-regulated those pairs whose R2 values were over a threshold thigh and, as putatively non co-regulated, those pairs whose R2 values were below a threshold tlow. Gene pairs with scores between thresholds were not considered. In order to construct a ROC curve we used varying thresholds: as an upper threshold thigh for co-regulation we selected the value corresponding to percentiles p ranging from 50th to 90th with a step of 10 and, as a lower threshold tlow for non-coregulation, we selected the value corresponding to the “symmetric” percentile 100−p. For each threshold we could compute true positives, true negatives, false positives, false negatives and therefore construct a ROC curve (Figure 3A
To evaluate the performance of predictions obtained with the lead-lag R2 we repeated the same analysis using the simultaneous R2 as a similarity measure between two given genes (Figure 3A mRNA Half-Lives and Lead-Lag R2 The peculiarity of the lead-lag relationship between two given genes relies on the presence of a common regulatory signal driving the expression of transcripts with equal or different mRNA half-lives. For this reason, we investigated whether co-regulated gene pairs having an high lead-lag R2 values are significantly enriched with differential transcript's stabilities. Half-life values are not available during the cell cycle and in the same experimental conditions used for establishing cell synchronization. Nevertheless, genome-wide half-lives data for un-synchronized cells were published recently by Wang et al. [26]. Using DNA microarrays, the authors precisely measured the decay of each yeast mRNA in YPD medium, after thermal inactivation of a temperature-sensitive RNA polymerase II. Such half-life measurements were not obtained during the cell cycle, so that we do not expect an exact agreement with the actual ones. Nevertheless, by considering a large number of gene pairs (16740) it appears reasonable that, on average, the half-life ratios between gene pairs may not vary significantly. Therefore, we used such available data for a statistical evaluation of the presence of gene pairs with high lead-lag R2 values with respect to the simultaneous R2 among those co-regulated pairs having large half-life ratios. To this end, we considered all possible gene pairs having, at least, one common transcription factor according to the MacIsaac et al. dataset [15] using a p-value for binding less that 0.001 and considered five half-life ratio bins: less than 2-fold, from 2-fold to 3-fold, from 3-fold to 4-fold, from 4-fold to 5-fold and more than 5-fold. We computed the simultaneous R2 and also the difference between the lead-lag R2 and simultaneous R2 for all the gene pairs in each of the half-life bins. Such difference is used in order to select that part of the lead-lag R2 value which is not due to the simultaneous espression of the gene pair. Therefore, we got a distribution of values for each half-life ratio bin and computed the corresponding mean value and standard deviation. Figure 4
Comparison to Other Similarity Measures The results presented so far have clearly shown that lead-lag correlation analysis outperforms the usual simultaneous correlation analysis (squared Pearson coefficient) for the prediction of co-regulation, i.e. the presence of a common transcription factor, from gene expression time profiles. As previously discussed, truly co-regulated genes do often display large differences of gene expression time profiles, e.g. peak shifts, delays or other kinds of nonlinear relationships. In this paragraph, we consider other similarity measures relevant to the analysis of gene expression data and compare their performances with those obtained using the lead-lag R2. In particular, we used 5 similarity measures other than the lead-lag: Spearman's rank, Kendall's tau, cosine, dynamic time-warped and time-delayed correlation, all squared to capture inverted relationships also. Spearman's rank, Kendall's tau and cosine correlation are the most common choices for the analysis of gene expression data in the presence of nonlinear relationships between time series, but they do not take into account the time ordering of data. By contrast, time-warped and time-delayed correlation have been specifically developed to analyze gene expression time profiles. The time-delayed correlation analysis has been proposed by Schmitt et al. [37] where, for any genes pair, a R2 value is obtained by selecting the highest simultaneous R2 over all admissible time delays between profiles. The dynamic time-warped correlation has been recently used by Aach and Church [40] and Hermans and Tsiporkova [41] for the alignment of gene expression time series obtained in experiments using different cell synchronization methods. These two works are both based, for gene-to-gene comparisons, on the Dynamic Time Warping (DTW) algorithm developed by Sankoff and Kruskal [42]. Accordingly, we defined a time-warped R2 by selecting the highest simultaneous R2 over all the possible time warped paths. For any similarity measure, we performed the same analysis reported in a previous section using the same data, and the results are shown in Figure 5
First of all, the cosine correlation analysis produces the poorest performances, very close to a random choice, and therefore such similarity measure is not reported in Figure 5 Examples of Lead-Lag Analysis Using Yeast Cell Cycle Gene Expression Data In this section we present some examples of “typical” lead-lag relationships using the most recent yeast cell cycle data [43] and discuss their biological relevance. The complete list of gene pairs exceeding the 95th percentile of the distribution for each of the R2 values considered in this paper is provided in the supporting information file Text S1. Key cell cycle regulators under common transcription factors The budding yeast cell cycle is characterized by consecutive waves of expression of key regulators such as cyclins and transcription factors [44]. CLB6, a G1/S-phase cyclin, has a lead-lag relationship with GIN4 as shown in Figure 6A
Cell Division Cycle 6 (CDC6) is a component of the pre-replicative complex essential for the initiation of DNA replication, normally expressed at the end of mitosis. It has a lead-lag relationships with ASH1 (Figure 6B SWI5 encodes a key transcription factor that activates transcription of genes expressed at the M/G1 boundary and in G1 phase of the cell cycle. NCE102 is a non-classical export protein involved in alternative clearance/detoxification pathway to eliminate damaged material [49]. They display a lead-lag relationship (Figure 6C YOX1 is a transcription factor involved in the repression of ECB acitivity [46] thus contributing to move the cycle forward. YOX1 shows a lead-lag relationship with MNN1 (Figure 6D All the above examples consist of pairs of genes that are under the control of the same transcription factor and that show differential mRNA stability values consistent with their lead-lag relationship (except for CDC6 transcript whose experimental half-life is not available). Moreover, it is worth noting that large differences in half-lives value (as in the cases shown in figure 6C and 6D Finally, it is worth noting that the lead-lag relationship is symmetrical and, therefore, it does not provide information about which gene is “lead” and which is “lag”. However, such information can be easily obtained by visual inspection. In fact, from Figure 6A Dynamic formation of the replication complex Many studies have focused on the relationship between gene expression time courses and the formation of protein complexes. Interestingly, Jansen et al. [31] suggested to classify protein complexes as either permanent or transient, with permament ones being maintained through most cellular conditions. They also found that, generally, permanent complexes tend to have simultaneously correlated gene expression while transient ones do not. Moreover, they also noted that subunits of the same protein complex may show significant simultaneous expression. In particular, they studied gene expression of the replication complex in yeast and found a very low simultaneous correlation among subunits, not significantly different from a random control [31]. However, they also found two sub-complexes – the MCM complex and the DNA polymerases δ and ε complex – showing much greater simultaneous correlation. Using gene expression time profiles during one cell cycle ([43], dataset, alpha_38 time series) for the genes encoding MCM proteins (MCM cluster) and DNA polymerases and ε (POL cluster), we computed simultaneous and lead-lag R2 and the scatterplots of the resulting values for gene pairs belonging to the two different sub-complexes are shown in Figure 7
Figure 7 Conclusions The expression of genes in the cell is to a large extent controlled at the level of mRNA accumulation. One key point in the analysis of gene expression dynamics is that mRNA abundance is determined by two regulated processes: transcription and degradation both specifically affecting transcript levels. Computational analysis of genome-wide expression time series has shown that clusters of co-expressed (i.e. simultaneously correlated) profiles often provide clues for the presence of common transcription factors regulating both genes. Such computational analysis (known as “clustering”) is very useful since it allows the prediction of the underlying regulatory actions based exclusively on the available gene expression data obtained from a given experiment. The rationale behind such belief is a sort of a “guilty by association” approach: genes' products appearing and disappearing at the same time are likely to have some common transcriptional regulation. Nevertheless, it may well be the case that the same transcriptional signal regulating two (or more) genes may yield quite different outcomes on each transcript. In fact, a number of biological events following transcription may selectively affect cytoplasmic mRNA abundance, such as, for example, the activity of the enzymatic machinery involved in mRNA processing and degradation. In order to address this issue, we provided a novel computational methodology that, based exclusively on the available gene expression data, is able to effectively predict co-regulation even with variation in the dynamic response due to mRNA stability differences. Moreover, our approach also captures the relation of simultaneous or time shifted co-expression so that it provides a single integrative general index – the lead-lag R2−able to uncover the presence of a common regulatory signal underlying gene expression time dynamics also at the post-transcriptional level. In order to test the validity of our approach on real data, we used yeast genome-wide cell-cycle expression time series obtained by several independent groups using different synchronization methods. In fact, by doing so, we could integrate the available cell cycle data and obtain a much more reliable aggregated dataset. We considered those gene pairs with the highest lead-lag R2 values and found the prediction for the presence of a common transcription factor to be highly consistent with protein-DNA binding data (ChIP experiments). Our results clearly indicate that co-regulation is not generally equivalent to simultaneous expression. We believe that the same analysis can be successfully used to predict post-transcriptional regulation, i.e. the presence of a common mechanism able to stabilize or de-stabilize specific transcripts, as for the members of the PUF proteins family [2]. Moreover, we envisage the possibility that our methodology could be used on different data and organisms and thus providing a computational support to the understanding of transcriptional and post-transcriptional networks, given the recent growing interest in the post-transcriptional regulation layer [1] of gene expression (miRNA) and its role in many diseases, such as cancer. Finally, the characterization of the replication complex in terms of lead-lag relationships among gene expression time profiles of its sub-complexes suggests the possibility that our analysis could be effectively used as a tool for predicting the formation of transient multiple protein complexes. Materials and Methods Computation of the Simultaneous and Lead-Lag R2 Between Gene Expression Time Profiles The mRNA relative abundance time course data obtained from cell populations experiments for gene A and B is denoted by mA and mB, respectively. The simultaneous R2, is the usual squared Pearson correlation coefficient which measures the fraction of the total variance explained by a linear fit between the two variables mA and mB, that is
The rationale behind the lead-lag R2 is the following. We considered two genes, A and B, subject to the same regulatory signal (promoter activity) – possibly of different strength – due to the presence at their promoters of the same TF complex in its active state. Moreoever, we assumed that the change in mRNA levels due to the degradation rate could be reasonably well captured by a first order rate kinetics [53], and consequently the dynamic equation that includes both synthesis and degradation is the following
= log(2)/t1/2) and ηX accounts for intrinsic and extrinsic noise. In order to remove size effects, the common signal between the promoter activities of the two genes is indicated as p(t) and is such that
The reason for the term “lead-lag” is due to the fact that two signals satisfying model (3) also define the transfer function of a “lead-lag compensator” widely used in control systems engineering. Assuming, for the sake of simplicity, the signals devoid of linear trends and noise (c4 = c5 = δ = 0), model (3) in the Laplace domain is as follows:
A Direct Formula for Computation of the Lead-Lag R2 from Gene Expression Data Let the available experimental time series of two genes A and B be composed of N>5 samples taken at times t1,…,tN. Model (3)
Numerical computation of time integral Given a gene expression time profile [mRNA]t measured at times t1,…,tN, we computed its time integral in two steps. First, we used a piecewise cubic Hermite interpolation formula to obtain, for each time interval, 4 more samples. Over the interpolated time series we computed the integral by using a 2-points closed Newton-Cotes formula (trapezoidal rule). Datasets Cell cycle regulated genes We considered the extended list of 1159 cell cycle regulated genes reported in reference [38]. Each gene in this list has been considered as cell-cycle regulated in at least one of the six methods reported in reference [38]. We used such an extended list in order to have a sufficiently large dataset for our statistical analysis. Gene expression datasets We considered yeast cell cycle data measured by three independent groups [4],[43],[54]. The data from the Spellman et al. group consist of genome-wide gene expression data during the yeast cell cycle using three different synchronization methods. We denoted as ELU, the elutriation based dataset composed of one cell cycle, as ALPHA, the pheromone α arrest factor based dataset composed of two cell cycles and as CDC15 the temperature sensitive CDC15 mutant based dataset composed of three cell cycles. Only two cell cycles of the CDC15 dataset could be used due to the large number of missing data. The dataset in Cho et al. [54], denoted by CDC28, is composed of two cell cycle and synchronized using a temperature sensistive CDC28 mutant. The last dataset has been downloaded from the authors website [43] and is composed of three genome-wide gene expression measurement during the yeast cell cycle using alpha factor synchronization. We denoted such dataset, composed of two cell cycles each, as ALPHA_28, ALPHA_30 and ALPHA_38. Two data sets, ALPHA_30 and ALPHA_38, are dye swap technical replicates. Transcription factors dataset We considered the main cell cycle TFs (SWI4, SWI6, MBP1, NDD1, FKH1, FKH2, MCM1, ACE2, SWI5, YOX1) according to Bahler [39], and as targets, those genes included in the McIsaac et al. dataset [15] with a stringent threshold for DNA binding (p-value<0.001). The MacIsaac et al. dataset contained 660 of the 1159 cell cycle regulated genes. Therefore, we ended up with a list of 660 genes available for the subsequent computational analysis. Half-lives dataset Integration of gene expression datasets For each dataset, we computed the simultaneous and lead-lag–R2 for all possible pairs using N = 660 genes, that is we computed such parameters for N(N−1)/2 = 217470 pairs. More precisely, the R2 values were computed for each cell cycle in each dataset, thus obtaining 13 values for each gene pair (ELU: 1 cell cycle, ALPHA: 2 cell cycles, CDC15: 2 cell cycles, CDC28: 2 cell cycles, ALPHA_28: 2 cell cycles, ALPHA_30: 2 cell cycles and ALPHA_38: 2 cell cycles). The average dataset has been constructed by computing the R2 values for each cycle and for each dataset, for a total amount of 13 cycles. The mean R2 value for each genes pair was obtained by computing the mean of the 13 available values. In case of missing data in the original dataset, computation of the mean R2 value was performed only when at least 8 out of 13 cycles were available. Such data were used to compute the diagram showed in Figure 3BAcknowledgments The authors thank Tim Gardner, Arun Krishnan, Feng He, and Alessandro Giuliani for their critical reading of the manuscript and constructive suggestions. Footnotes The authors have declared that no competing interests exist. This work was partially supported by a grant from ASI, Biotechnology Program. References 1. Garneau NL, Wilusz J, Wilusz CJ. The highways and byways of mRNA decay. Nat Rev Mol Cell Biol. 2007;8:113–126. [PubMed] 2. Gerber AP, Herschlag D, Brown PO. Extensive Association of Functionally and Cytotopically Related mRNAs with Puf Family RNA-Binding Proteins in Yeast. PLoS Biol. 2004;2(3):e79. doi: 10.1371/journal.pbio.0020079. [PubMed] 3. Guan Q, Zheng W, Tang S, Liu X, Zinkel RA, et al. Impact of nonsense-mediated mRNA decay of the global expression profile of budding yeast. Plos Genetics. 2006;2:1924–1943. doi:10.1371/journal.pgen.0020203. 4. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998;9:3273–3297. [PubMed] 5. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nature Gen. 1999;22:281–285. 6. Brazma A, Jonassen I, Vilo J, Ukkonen E. Predicting gene regulatory elements in silico on a genomic scale. Genome Res. 1998;8:1202–15. [PubMed] 7. Wolfsberg TG, Gabrielian AE, Campbell MJ, Cho RJ, Spouge JL, et al. Candidate regulatory sequence elements for cell cycle-dependent transcription in Saccharomyces cerevisiae. Genome Res. 1999;9:775–92. [PubMed] 8. Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science. 2001;292:929–34. [PubMed] 9. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. [PubMed] 10. Tamayo P, Slonim D, Mesirov J, Zhudagger Q, Kitareewan S, et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA. 1999;96:2907–2912. [PubMed] 11. Brazhnik P, de la Fuente A, Mendes P. Gene networks: how to put the function in genomics. Trends Biotech. 2002;20:467–472. 12. Gardner TS, di Bernardo D, Lorenz D, Collins JJ. Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling. Science. 2003;301:102–105. [PubMed] 13. Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, et al. Reverse engineering of regulatory networks in human B cells. Nature Gen. 2005;37:382–390. 14. Wit E, McClure J. Statistics for Microarrays Design, Analysis and Inference. Chichester, UK: John Wiley & Sons Ltd; 2004. 15. MacIsaac KD, Wang T, Gordon DB, Gifford DK, Stormo GD, et al. An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics. 2006;7:113–125. [PubMed] 16. Qian J, Dolled-Filhart M, Lin J, Yu H, Gerstein M. Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. J Mol Biol. 2001;314:1053–1066. [PubMed] 17. Zhu Z, Pilpel Y, Church GM. Computational identification of transcription factor binding sites via a transcription-factor-centric clustering (TFCC) algorithm. J Mol Biol. 2002;318:71–81. [PubMed] 18. Pilpel Y, Sudarsanam1 P, Church GM. Identifying regulatory networks by combinatorial analysis of promoter elements. Nature Gen. 2001;29:153–159. 19. Banerjee N, Zhang MQ. Identifying cooperativity among transcription factors controlling the cell cycle in yeast. Nucl Acids Res. 2003;31:7024–7031. [PubMed] 20. Yu H, Luscombe NM, Qian J, Gerstein M. Genomic analysis of gene expression relationships in transcriptional regulatory networks. Trends Gen. 2003;19:422–427. 21. Kato M, Hata N, Banerjee N, Futcher B, Zhang MQ. Identifying combinatorial regulation of transcription factors and binding motifs. Genome Biol. 2004;5:R56. [PubMed] 22. Wang W, Cherry M, Nochomovitz Y, Jolly E, Botstein D, et al. Inference of combinatorial regulation in yeast transcriptional networks: A case study of sporulation. Proc Natl Acad Sci USA. 2005;102:1998–2003. [PubMed] 23. Balaji S, Babu M, Iyer LM, Luscombe NM, Aravind L. Comprehensive Analysis of Combinatorial Regulation using the Transcriptional Regulatory Network of Yeast. J Mol Biol. 2006;360:213–227. [PubMed] 24. He F, Buer J, Zeng AP, Balling R. Dynamic cumulative activity of transcription factors as a mechanism of quantitative gene regulation. Genome Biol. 2007;8:R181. [PubMed] 25. Smith JJ, Ramsey SA, Marelli M, Marzolf B, Hwang D, et al. Transcriptional responses to fatty acid are coordinated by combinatorial control. Mol Syst Biol. 2007;3:115. [PubMed] 26. Wang Y, Liu CL, Storey JD, Tibshirani RJ, Herschlag D, et al. Precision and functional specificity in mRNA decay. Proc Natl Acad Sci USA. 2002;99:5860–5865. [PubMed] 27. Grigull J, Mnaimneh S, Pootoolal J, Robinson MD, Hughes TR. Genome-wide analysis of mRNA stability using transcription inhibitors and microarrays reveals posttranscriptional control of ribosome biogenesis factors. Mol Cell Biol. 2004;24:5534–5547. [PubMed] 28. Farina L, De Santis A, Morelli G, Ruberti I. Dynamic measure of gene co-regulation. IET Syst Biol. 2007;1:10–17. [PubMed] 29. Franklin G, Powell JD, Abbas Emami-Naeini. Feedback Control of Dynamic Systems. Prentice-Hall, 4th edition. 2002 30. Ge H, Liu Z, Church GM, Vidal M. Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nature Gen. 2001;29:482–486. 31. Jansen R, Greenbaum D, Gerstein M. Relating whole-genome expression data with protein-protein interactions. Genome Res. 2002;12:37–46. [PubMed] 32. Foat BC, Houshmandi SS, Olivas WM, Bussemaker HJ. Profiling condition-specific, genome-wide regulation of mRNA stability in yeast. Proc Natl Acad Sci USA. 2005;102:17675–17680. [PubMed] 33. Hargrove JL. Microcomputer-assisted kinetic modelling of mammalian gene expression. FASEB J. 1993;7:1163–1170. [PubMed] 34. Simon I, Barnett J, Hannett N, Harbison CT, Rinaldi NJ, et al. Serial regulation of transcriptional regulators in the yeast cell cycle. Cell. 2001;106:697–708. [PubMed] 35. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, et al. Network motifs: simple building blocks of complex networks. Science. 2002;298:824–827. [PubMed] 36. Arkin A, Shen PD, Ross J. A Test Case of Correlation Metric Construction of a Reaction Pathway from Measurements. Science. 1997;277:1275–1279. 37. Schmitt WA, Jr, Raab M, Stephanopoulos G. Elucidation of gene interaction networks through time-lagged correlation analysis of transcriptional data. Genome Res. 2004;14:1654–1663. [PubMed] 38. de Lichtenberg U, Juhl Jensen L, Fausboll A, Jensen TS, Bork P, et al. Comparison of computational methods for the identification of cell cycle-regulated genes. Bioinformatics. 2005;21:1164–1171. [PubMed] 39. Bahler J. Cell-cycle control of gene expression in budding and fission yeast. Annu Rev Genet. 2005;39:69–94. [PubMed] 40. Aach J, Church GM. Aligning gene expression time series with time warping algorithms. Bioinformatics. 2001;17:495–508. [PubMed] 41. Hermans F, Tsiporkova S. Merging microarray cell synchronization experiments through curve alignment. Bioinformatics. 2006;23:64–60. [PubMed] 42. Sankoff D, Kruskal J. Time warps, string edits, and macromolecules: the theory and practice of sequence comparison. Reading, MA: Addison Wesley; 1983. 43. Pramila T, Wu W, Noble WS, Breeden LL. The Forkhead transcription factor Hcm1 regulates chromosome segregation genes and fills the S-phase gap in the transcriptional circuitry of the cell cycle. Genes Dev. 2006;20:2266–2278. Public data available at: http://www.fhcrc.org/science/labs/breeden/cellcycle/. Accessed 30 June 2008. [PubMed] 44. Breeden LL. Cyclin transcription: timing is everything. Curr Biol. 2000;10:586–588. 45. Kuai L, Das B, Sherman F. A nuclear degradation pathway controls the abundance of normal mRNAs in Saccharomyces cerevisiae. Proc Natl Acad Sci USA. 2005;102:13962–13967. Available: http://dbb.urmc.rochester.edu/labs/sherman_f/mRNAs.htm. Accessed 30 June 2008. [PubMed] 46. Breeden LL. Periodic transcription: a cycle within a cycle. Curr Biol. 2003;13:31–38. 47. Piatti S, Lengauer C, Nasmyth K. Cdc6 is an ustable protein whose de novo synthesis in G1 is important for the onset of the S phase and for preventing a ‘reductional’ anaphase in the budding yeast Saccharomyces cerevisiae. EMBO J. 1995;14:3788–3799. [PubMed] 48. McBride HJ, Yu Y, Stillman DJ. Distinct regions of the Swi5 and Ace2 transcription factors are required for specific gene activation. J Biol Chem. 1999;274:21029–21036. [PubMed] 49. Desmyter L, Verstraelen J, Dewaele S, Libert C, Contreras R, et al. Nonclassical export pathway: overexpression of NCE102 reduces protein and DNA damage and prolongs lifespan in an SGS1 deficient Saccharomyces cerevisiae. Biogerontol. 2007;8:527–535. 50. Gertien J, Smits GJ, Schenkman LR, Brul S, Pringle JR, et al. Role of Cell Cycle-regulated Expression in the Localized Incorporation of Cell Wall Proteins in Yeast. Mol Biol Cell. 2006;17:3267–3280. [PubMed] 51. Simonis N, van Helden J, Cohen GN, Wodak SJ. Transcriptional regulation of protein complexes in yeast. Genome Biol. 2004;5:R33. [PubMed] 52. Wade TJ, Hall DB, Struhl K. The transcription factor IFH1 is a key regulator of yeast ribosomal protein genes. Nature. 2004;432:1054–1058. [PubMed] 53. Ross J. mRNA stability in mammalian cells. Microbiol Rev. 1995;59:423–450. [PubMed] 54. Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, et al. A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell. 1998;2:65–73. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Nat Rev Mol Cell Biol. 2007 Feb; 8(2):113-26.
[Nat Rev Mol Cell Biol. 2007]PLoS Biol. 2004 Mar; 2(3):E79.
[PLoS Biol. 2004]Mol Biol Cell. 1998 Dec; 9(12):3273-97.
[Mol Biol Cell. 1998]Genome Res. 1998 Nov; 8(11):1202-15.
[Genome Res. 1998]Science. 2001 May 4; 292(5518):929-34.
[Science. 2001]Proc Natl Acad Sci U S A. 1998 Dec 8; 95(25):14863-8.
[Proc Natl Acad Sci U S A. 1998]Proc Natl Acad Sci U S A. 1999 Mar 16; 96(6):2907-12.
[Proc Natl Acad Sci U S A. 1999]BMC Bioinformatics. 2006 Mar 7; 7():113.
[BMC Bioinformatics. 2006]J Mol Biol. 2001 Dec 14; 314(5):1053-66.
[J Mol Biol. 2001]J Mol Biol. 2002 Apr 19; 318(1):71-81.
[J Mol Biol. 2002]Mol Syst Biol. 2007; 3():115.
[Mol Syst Biol. 2007]Proc Natl Acad Sci U S A. 2002 Apr 30; 99(9):5860-5.
[Proc Natl Acad Sci U S A. 2002]IET Syst Biol. 2007 Jan; 1(1):10-7.
[IET Syst Biol. 2007]Genome Res. 2002 Jan; 12(1):37-46.
[Genome Res. 2002]IET Syst Biol. 2007 Jan; 1(1):10-7.
[IET Syst Biol. 2007]Proc Natl Acad Sci U S A. 2005 Dec 6; 102(49):17675-80.
[Proc Natl Acad Sci U S A. 2005]FASEB J. 1993 Sep; 7(12):1163-70.
[FASEB J. 1993]Proc Natl Acad Sci U S A. 2002 Apr 30; 99(9):5860-5.
[Proc Natl Acad Sci U S A. 2002]Cell. 2001 Sep 21; 106(6):697-708.
[Cell. 2001]Science. 2002 Oct 25; 298(5594):824-7.
[Science. 2002]J Mol Biol. 2001 Dec 14; 314(5):1053-66.
[J Mol Biol. 2001]Genome Res. 2004 Aug; 14(8):1654-63.
[Genome Res. 2004]Bioinformatics. 2005 Apr 1; 21(7):1164-71.
[Bioinformatics. 2005]BMC Bioinformatics. 2006 Mar 7; 7():113.
[BMC Bioinformatics. 2006]Annu Rev Genet. 2005; 39():69-94.
[Annu Rev Genet. 2005]BMC Bioinformatics. 2006 Mar 7; 7():113.
[BMC Bioinformatics. 2006]BMC Bioinformatics. 2006 Mar 7; 7():113.
[BMC Bioinformatics. 2006]Proc Natl Acad Sci U S A. 2002 Apr 30; 99(9):5860-5.
[Proc Natl Acad Sci U S A. 2002]BMC Bioinformatics. 2006 Mar 7; 7():113.
[BMC Bioinformatics. 2006]Genome Res. 2004 Aug; 14(8):1654-63.
[Genome Res. 2004]Bioinformatics. 2001 Jun; 17(6):495-508.
[Bioinformatics. 2001]Bioinformatics. 2007 Jan 1; 23(1):64-70.
[Bioinformatics. 2007]Genes Dev. 2006 Aug 15; 20(16):2266-78.
[Genes Dev. 2006]Cell. 2001 Sep 21; 106(6):697-708.
[Cell. 2001]BMC Bioinformatics. 2006 Mar 7; 7():113.
[BMC Bioinformatics. 2006]Proc Natl Acad Sci U S A. 2005 Sep 27; 102(39):13962-7.
[Proc Natl Acad Sci U S A. 2005]Genes Dev. 2006 Aug 15; 20(16):2266-78.
[Genes Dev. 2006]Proc Natl Acad Sci U S A. 2002 Apr 30; 99(9):5860-5.
[Proc Natl Acad Sci U S A. 2002]Proc Natl Acad Sci U S A. 2005 Sep 27; 102(39):13962-7.
[Proc Natl Acad Sci U S A. 2005]BMC Bioinformatics. 2006 Mar 7; 7():113.
[BMC Bioinformatics. 2006]EMBO J. 1995 Aug 1; 14(15):3788-99.
[EMBO J. 1995]J Biol Chem. 1999 Jul 23; 274(30):21029-36.
[J Biol Chem. 1999]BMC Bioinformatics. 2006 Mar 7; 7():113.
[BMC Bioinformatics. 2006]Proc Natl Acad Sci U S A. 2002 Apr 30; 99(9):5860-5.
[Proc Natl Acad Sci U S A. 2002]Mol Biol Cell. 2006 Jul; 17(7):3267-80.
[Mol Biol Cell. 2006]BMC Bioinformatics. 2006 Mar 7; 7():113.
[BMC Bioinformatics. 2006]Proc Natl Acad Sci U S A. 2002 Apr 30; 99(9):5860-5.
[Proc Natl Acad Sci U S A. 2002]Genome Res. 2002 Jan; 12(1):37-46.
[Genome Res. 2002]Genes Dev. 2006 Aug 15; 20(16):2266-78.
[Genes Dev. 2006]Genome Biol. 2004; 5(5):R33.
[Genome Biol. 2004]Nature. 2004 Dec 23; 432(7020):1054-8.
[Nature. 2004]Genes Dev. 2006 Aug 15; 20(16):2266-78.
[Genes Dev. 2006]Proc Natl Acad Sci U S A. 2002 Apr 30; 99(9):5860-5.
[Proc Natl Acad Sci U S A. 2002]PLoS Biol. 2004 Mar; 2(3):E79.
[PLoS Biol. 2004]Nat Rev Mol Cell Biol. 2007 Feb; 8(2):113-26.
[Nat Rev Mol Cell Biol. 2007]Microbiol Rev. 1995 Sep; 59(3):423-50.
[Microbiol Rev. 1995]Bioinformatics. 2005 Apr 1; 21(7):1164-71.
[Bioinformatics. 2005]Mol Biol Cell. 1998 Dec; 9(12):3273-97.
[Mol Biol Cell. 1998]Genes Dev. 2006 Aug 15; 20(16):2266-78.
[Genes Dev. 2006]Mol Cell. 1998 Jul; 2(1):65-73.
[Mol Cell. 1998]Annu Rev Genet. 2005; 39():69-94.
[Annu Rev Genet. 2005]BMC Bioinformatics. 2006 Mar 7; 7():113.
[BMC Bioinformatics. 2006]Proc Natl Acad Sci U S A. 2002 Apr 30; 99(9):5860-5.
[Proc Natl Acad Sci U S A. 2002]Proc Natl Acad Sci U S A. 2005 Sep 27; 102(39):13962-7.
[Proc Natl Acad Sci U S A. 2005]