![]() | ![]() |
Formats:
|
||||||||||||||||||||
Copyright © 2007 García-Martínez et al.; licensee BioMed Central Ltd. Common gene expression strategies revealed by genome-wide analysis in yeast 1Sección de Chips de DNA-SCSIE, Universitat de València, Dr Moliner 50, E-46100, Burjassot, Spain 2Departamento de Bioquímica y Biología Molecular, Universitat de València, Dr Moliner 50, E-46100, Burjassot, Spain 3Instituto Cavanilles de Biodiversidad y Biología Evolutiva and Departamento de Genética, Universitat de València, Dr Moliner 50, E-46100, Burjassot, Spain Corresponding author.José García-Martínez: jose.garcia-martinez/at/uv.es; Fernando González-Candelas: fernando.gonzalez/at/uv.es; José E Pérez-Ortín: jose.e.perez/at/uv.es Received March 15, 2007; Revised July 24, 2007; Accepted October 19, 2007. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Background Gene expression is a two-step synthesis process that ends with the necessary amount of each protein required to perform its function. Since the protein is the final product, the main focus of gene regulation should be centered on it. However, because mRNA is an intermediate step and the amounts of both mRNA and protein are controlled by their synthesis and degradation rates, the desired amount of protein can be achieved following different strategies. Results In this paper we present the first comprehensive analysis of the relationships among the six variables that characterize gene expression in a living organism: transcription and translation rates, mRNA and protein amounts, and mRNA and protein stabilities. We have used previously published data from exponentially growing Saccharomyces cerevisiae cells. We show that there is a general tendency to harmonize the levels of mRNA and protein by coordinating their synthesis rates and that functionally related genes tend to have similar values for the six variables. Conclusion We propose that yeast cells use common expression strategies for genes acting in the same physiological pathways. This trend is more evident for genes coding for large and stable protein complexes, such as ribosomes or the proteasome. Hence, each functional group can be defined by a 'six variable profile' that illustrates the common strategy followed by the genes included in it. Genes encoding subunits of protein complexes show a tendency to have relatively unstable mRNAs and a less balanced profile for mRNA than for protein, suggesting a stronger regulation at the transcriptional level. Background The central dogma of molecular biology [1] states that information runs from DNA to protein. In spite of the increasing number of non-protein-coding genes discovered in the past few years, it is still true that a large part of the genetic information follows the central dogma. Therefore, it would be interesting to evaluate the respective contributions and the balance between all the steps in the flow of genetic information from the gene (DNA) to the final product (protein). Because the ready availability of protein is its final goal, the complex process of gene regulation should be addressed to this aspect. However, given that mRNA is an obligate intermediate step and because the amounts of both mRNA (RA) and protein (PA) are controlled by synthesis and degradation rates, the desired PA can be obtained following different strategies that should take into account the energy costs of each step, the appropriate speed of response to potential changes in the environment [2], the optimal biological noise [3-5] and the possibility of post-transcriptional and/or post-translational regulatory mechanisms [4]. For instance, a given PA can be obtained by maximizing the transcription rate (TR) with a moderate mRNA stability (RS) to obtain a high RA. Ribosomal proteins are an example of this strategy [6]. In other cases, a high RS compensates for a low TR (reviewed in [7]). Sometimes, a low RA can be compensated for by a high TR for each molecule (individual translation rate (TLRi)) or vice-versa [8]. Understanding how PA is related to RA and how RA depends on TR and RS is essential for interpreting the different strategies for gene expression. The stability of the protein molecule (PS) is the final variable determining PA [9]. In general, there is a positive correlation between RA and PA [8,10,11], although it has been shown that in many cases the amount of mRNA is not a good predictor of the amount of protein [12]. The correlation depends critically on the functional categories of genes and proteins [8,13]. Mechanisms for regulating expression at each of these levels have been shown in many organisms, including yeast [7,12,14]. The yeast Saccharomyces cerevisiae is probably the most intensively studied organism using functional genomics technologies. In spite of a recent comprehensive study on Schizosaccharomyces pombe [15], S. cerevisiae remains the only organism for which all the six variables in the genetic expression flow (Figure (Figure1),1
In this paper we analyze the relationships between all six variables under yeast exponential growth in yeast extract-peptone-dextrose (YPD) culture medium. Our analyses show that functionally related genes tend to have similar values for the six variables, which demonstrates that yeast cells use common expression strategies (CESs) for genes in the same physiological pathways. Accordingly, each functional group can be defined by a 'six variable profile' (6VP) that illustrates the strategy followed by that particular group. It is also shown that synthesis rates and molecule amounts tend to be more highly correlated than stabilities. The unique behavior of RS for many genes involved in stable protein complexes suggests that, for those groups, regulation at the transcriptional level is particularly important. Results Variables acting on the genetic information flow The recent availability of high-throughput data from the yeast S. cerevisiae [8,9,17,20,22,23] opens the possibility of analyzing the relationships between the six variables that control gene expression (TRi, RA, RS, TLRi, PA and PS; Figure Figure1)1 The actual production rates of mRNA and protein, TR and TLR, are, in fact, the product of individual rates, TRi and TLRi, times the number of genes or mRNA copies, respectively. In this case, these two variables are practically equivalent for calculating TR because almost all yeast genes are single copy. Therefore, we have used TR throughout this paper. However, given that TLR and TLRi are essentially different, in this study we have used TLR, TLRi or both, depending on the specific goal of each analysis. Correlation between variables An essential question in molecular biology is to determine which strategy the cells adopt to obtain a given amount of mRNA and protein from each gene and whether the strategies are similar or different for both molecules. Since the amount of each molecule depends on the corresponding synthesis and degradation rates then the use of similar or different strategies for mRNA and protein will affect the correlations between TR and TLRi, and between RS and PS. Moreover, cross-correlations between synthesis rates or stabilities with the amounts of the respective products, mRNA or protein, will inform about the contributions of TR and RS to RA and TLRi and PS to PA. Pair-wise correlations between the seven variables considered were obtained using Spearman rank coefficients (Figure (Figure2a).2a
To better understand the processes underlying the detected correlations, we looked for Gene Ontology (GO) categories enriched in some specific correlations. For this, we first analyzed the correlations between variables of the same type (amounts, individual rates and stabilities) by ranking the corresponding values for the 4,215, 5,590 and 2,618 genes, respectively, for which data on mRNA and protein were available (Additional data files 8 and 13), then divided the list into quintiles (1 to 5 from higher to lower values) and finally compared the positions of the two analyzed variables for each gene. The correlations between the three pair-wise comparisons were classified into five categories ('very high', 0; 'high', 1; 'medium', 2; 'low', 3; or 'very low', 4) by considering the absolute difference between the quintile values for the two variables in each comparison, as described in Materials and methods. As can be seen in Figure Figure2b,2b After looking for GO categories statistically enriched in the five levels of correlations, we found that some of them were very significant in the 'high correlation' classes, involving high abundance or synthesis rates (quintiles 1-2), most notably cytosolic ribosome, protein biosynthesis, hydrogen transport, redox activity and proteasome, among others (Table 1). Other GO categories were found only in the abundance, but not in the rate, classes (for example, carboxylic acid metabolism, ribosome biogenesis, and so on), or in rate classes only (such as mitochondrial ribosome). There were also GO categories highly represented in the low abundance and/or rate classes (quintiles 4-5): cell cycle, DNA metabolism, DNA binding, regulation of transcription, response to stimulus, and so on. Many of them were related to regulation or control processes. The general trend is that amounts of mRNA and protein are correlated mainly by coordinating their synthesis rates, either if they correspond to abundant proteins, such as the ones belonging to macromolecular complexes, or to scarce ones, such as those involved in regulation.
Some GO categories also appeared significantly over-represented in the 'low correlation' classes, thus involving comparisons between variables from quintiles 4/5 and quintiles 1/2: ribosome biogenesis, spore wall assembly, glycoprotein biosynthesis, and so on, for the high TR/low TLRi; and membrane, transporter, and so on, for the high RA/low PA (Table 1). It is interesting to note that 24 genes from the 'ribosome biogenesis' category (Additional data file 9) appeared in this class as well as in the very high correlation class described above. This means that these genes have very high amounts of mRNA and protein, a high TLR but a low TR. These last results indicate that some genes use opposite strategies for mRNA and protein molecules, revealing the existence of several different expression strategies for yeast genes. Clustering of yeast genes according to the six variables of gene expression The previous results suggest that functionally related genes tend to be grouped according to their gene expression variables. To further explore this possibility, we performed a clustering analysis of the 3,991 genes for which data on at least 5 variables were available (Additional data file 13) as a function of their RA, PA, TR, TLRi, RS and PS values. We could have used TLR instead of TLRi, but we chose to use TLRi here because it is not mathematically linked to RA, thus making the clustering less prone to artifacts. In any case, using different normalization methods, or using TLR instead of TLRi, led to essentially similar results (not shown). Since the value ranges for the six variables were quite different, we used the z-score normalization because it better preserves the original relative dispersion. As a result, each gene was characterized by a profile for the arbitrarily ordered (1 to 6: RA-TR-RS-PA-TLRi-PS) variables, which allowed comparing all the genes for common profiles using standard clustering methods. For this we chose the Self-organizing Tree Algorithm (SOTA) [25] from the GEPAS package [26]. This is a self-organizing neural network that expands depending on the relationships among the units being analyzed. The growth nature of this procedure allows it to be stopped at the desired level of similarity resolution, which is reflected in a higher or lower number of clusters. Figure Figure33
The finding of many groups of functionally related genes or whose proteins form macromolecular complexes clustering together suggests that the yeast S. cerevisiae uses CES in order to coordinate its physiological functions. Detailed analysis of functional groups Since many clusters in Figure Figure33
Comparison of mRNA and protein patterns The plots in Figures Figures33 First, given that RS seemed to be lower than TR for many groups, we analyzed the whole gene set (Table 2). Although genes with TR > RS were slightly more abundant than expected, the difference was not statistically significant. However, it is true that genes with a lower TR than RS were less common than expected and that those for which TR = RS were more frequent than expected. This trend was more marked when using only genes from the MIPS set of protein complexes. The analyses for protein profiles showed that they tended to be less unbalanced than those of mRNA, with a highly significant excess of genes with TLR = PS. This prompted us to analyze the whole profiles, including amounts of both products (RA and PA). It can be seen in Table 3 that both mRNA and protein had a significant excess of flat profiles, although this effect was much more important for protein. Similar results were obtained classifying genes into ten instead of five categories (results not shown).
The fact that mRNA profiles were more unbalanced than protein ones could be a consequence of strategies favoring regulation at the transcription level. To test this hypothesis, we calculated the average fold-change of yeast genes in the study of Gasch et al. [14] in which cells were analyzed under many different conditions that favored changes in gene expression. It can be seen in Figure Figure55
Discussion The yeast S. cerevisiae is considered to be the first organism for which a comprehensive description of most gene products and their functional integration will be obtained [27]. The reason for this is that functional genomics methods are providing systematic information about many steps in the pathways of gene expression flow. In this organism, for the first time in biology, there are estimates of the amounts of protein and mRNA as well as their synthesis rates and stabilities at a genomic scale. We have used data previously published by our [19] and other groups [8,9,17,18,20,22] for TR, RA, RS, PA, TLRi and PS together with our computations from previous experimental data [20] of TLR. As a result, we have obtained comprehensive information about the genetic expression flow for 5,968 yeast genes (Additional data files 8 and 13), with at least two of the above variables being compared. As indicated previously, the quality of the data used in this analysis was variable. For instance, RA data calculated from DNA microarrays are thought not to be reliable below approximately 1 molecule/cell [28]. PA data are probably even less accurate [8]. As discussed by Jansen and Gerstein [29], functional genomics data sets contain a high degree of experimental uncertainty because they have a high amount of error and noise. The use of these data sets can also be hampered because the results were obtained by different laboratories under non-identical growth conditions. We decided to use normalized data to avoid problems related to the uncertainty of absolute values and the comparison of data measured in different scales. Since experimental error and noise should randomize the data, then no statistically significant results should be expected after analyses such as ours. However, our results demonstrate that, even using data from diverse sources, global analyses can benefit from the integration of many data, leading to biologically meaningful conclusions. To our knowledge, no previous studies have performed exhaustive comparisons among these variables as described here. Single comparisons between RA and PA in yeast have been done previously [4,8,9,11-13,17,18,30]. Correlation coefficients were significant but not very high. For some groups of genes the correlation is low, which has been interpreted as an indication of post-transcriptional regulation [11]. Nevertheless, there are important differences between different functional groups. The general conclusion of these simple comparisons was that there is a significant positive correlation between the amount of a protein and that of the mRNA encoding it. We postulate here that it is mainly due to the coordination between their synthesis rates (see below). We previously made a simple comparison between TR and RA [19]. The positive correlation found was not unexpected because it is commonly accepted that mRNA amounts depend directly on their synthesis rates. Beyer et al. [17] performed a different kind of analysis, centered on functional categories, of the TLR-PA comparison. TLR can change depending on the RA but also independently of it in some genes [10]. Belle et al. [9] also made a comparison between PS, TLR and PA. They found positive correlations between PA and the other two variables. Lu et al., [11] made comparisons between PA and TR, TLR and TLRi. They found positive correlations in all cases. We have explored several ways to normalize the data before comparing them. For correlation analysis we chose to rank every variable because, in this way, the relative position within the cell physiology of each gene allows an easier analysis of the positions of specific GO classes. We have found that, apart from confirming the positive correlations cited above, there is a significant, high positive correlation between TLRi and TR. Since RS and PS are not correlated (Figure (Figure2a),2a The negative correlation between RA and RS is interesting. Wang et al. [22] did not find any correlation using similar data. This could be due to their use of Pearson correlation whereas we have used Spearman rank correlation, which is less sensitive to noise in individual data sets. A negative correlation like this one has been observed for Escherichia coli [30] and for the archaeon Sulfolobus [31]. The low mRNA stability of highly transcribed genes in these organisms was partially interpreted as a feature for noise minimization and a way for rapid adaptation to environmental changes. Here, we have found a negative correlation between RS and TR in S. cerevisiae. Thus, it seems likely that free-living organisms use similar strategies with regard to mRNA stability. A negative correlation between TLR and RS was also found. Because TLR is the product of TLRi and RA, this can be the result of the negative correlation of RA and RS and the lack of correlation between TLRi and RS. However, no correlation between RS and TR and a positive correlation between ribosome density and ribosome occupancy (both components of TLRi) and RS [15] have been found in S. pombe. We do not know whether this reflects a truly different behavior between these two yeast species or it is due to the small and biased number of mRNAs (only the 868 least stable ones) for which RS was calculated in that study. To further verify the consistency of the groupings obtained with these analyses, we tried different clustering methods. For clustering analysis we assayed several normalization procedures, including ranking and a range of normalizing transformations, and different clustering methods: PCA, k-means, and hierarchical unsupervised growing neural networks. We found that z-score normalization and SOTA hierarchical clustering [25,26] produced the best results in terms of recovery of significant GO categories. This reasoning is considered to be the best method to evaluate the quality of clustering protocols [32]. In any case, the general conclusions obtained after clustering were the same regardless of the algorithm used. We are aware that our method has an unavoidable bias due to the identical weight assigned to the six variables, but this affects similarly all categories found and, in consequence, cannot produce biases in the recovery of GO categories. The SOTA clustering of z-score vectors for the 3,991 genes considered (Additional data file 13) yielded a tree with two main subgroups (Figure (Figure3).3 Clusters in the upper part of the tree in Figure Figure33 Using the MIPS classification, we found an enrichment of genes belonging to protein complexes in the profiles with a predominance of TR over RS (Table 2). It is accepted that proteins belonging to the same complex must be present in similar amounts because the excess of any subunit would be wasteful (see [33]). Therefore, coordination of the corresponding PAs is to be expected. However, many (perhaps all) protein complexes in the cell are formed by subunits that are not exclusive to only one complex, being included in other complex(es) as well. Some studies on yeast complexes have shown that a core or protein sub-complex of highly co-expressed and functionally related subunits exists and that this core is surrounded by less cross-related, 'halo' proteins [33,34]. Additionally, some complexes are transient while others are permanent [33]. Our results show that the large and permanent complexes correspond to the best-clustered groups and that they tend to have higher TR than RS. Fraser et al. [3] found that genes belonging to protein complexes have less biological noise than average because of a high TR and low number of 'transcriptions per mRNA', which implies low RS. Thus, it seems that one reason for common 6VPs in members of some complexes could be the need for low noise. Previously, it has been found in some studies that genes for the cytosolic and mitochondrial ribosomes and the proteasome subunits behave coordinately with respect to TR, RA and/or RS [19,23,33]. On the other hand, Wang et al. [22] found that subunits of the main cellular complexes, including both kinds of ribosomes, the nucleosome and the proteasome, have similar RS. We have found that other variables, such as PA and TLR, are also conserved for such complexes. We can conclude that, in general, the whole 6VP is very uniform for the members of these permanent complexes. This result is also observed for other smaller complexes (Figure (Figure4)4 The predominance of rates over stabilities (especially TR over RS) shown by the groups in the upper part of the tree (Figures (Figures33 The lower part of the tree in Figure Figure33 It is interesting to analyze in more detail the group 'Energy pathways' in Figure Figure4.4 To obtain the desired RA or PA, the most important factor seems to be the synthesis rate. This is reflected in the positive correlations observed between RA, PA, TR and TLR (Figure (Figure2).2 It seems that whereas PS works in the same direction as TLR to control PA, which is, therefore, positively correlated with amounts and rates, RS works in the opposite direction for most genes. Among the possible expression strategies, those with less stable molecules are more costly but allow faster tracking of environmental changes [24]. In this way, strategies with relative low RS or PS are only appropriate for genes expected to need rapid expression changes. The costs for low RS and for low PS are, however, very different. Translation requires much more energy than transcription. For a standard yeast gene, transcription consumes six ATP molecules per triplet for a mRNA molecule, while translation consumes four ATP molecules per amino acid. However, on average, mRNAs are six times less stable than proteins (26 minutes versus 154 minutes) and the mean number of protein molecules per mRNA molecule for a yeast gene ranges from 4,000 [17] (our data) to 5,600 [11]. This means that costly strategies for mRNA may be economical and efficient if they allow for a fast change in the amounts of mRNA to minimize translation costs. In Figure Figure44 Protein variables show less unbalanced profiles than their mRNA counterparts. The average standard deviation (SD) for PA-TLR-PS, expressed as percentile values, is 0.196 while for mRNA it is 0.235 (Additional data file 13). The smoothness of the protein profile is even more pronounced in the group of very highly correlated genes (Figure (Figure2b),2b Conclusion We propose that the analysis of all the variables that affect the flow of gene expression is a useful strategy to investigate the regulatory strategies used by a cell. We conclude from our study that the synthesis rates for both mRNA and protein are the main determinants of the amount of the respective molecules and that yeast cells use CESs for genes acting in the same physiological pathways. This feature is more clearly shown for genes coding for large and stable protein complexes, such as the ribosome or the proteasome. Hence, each functional group can be defined by a 6VP that illustrates the common strategy followed by its members. For many groups whose genes encode subunits of protein complexes, there is a tendency to have relatively unstable mRNAs and a more unbalanced profile for mRNA than for protein, which suggests a stronger regulation at the mRNA level. Current knowledge from other model organisms, such as S. pombe [15], indicates that the CES can be different for specific gene groups in different organisms. We anticipate that differences in CES will be even stronger for the different cell types of higher eukaryotes, a result of the large differences in their living environments. Materials and methods Selection and features of the original data Many studies have produced RA data from S288c-type yeast strains growing in YPD medium. For our analyses we chose the reference set constructed by Beyer et al. [17], who used 36 microarray experiments normalized and corrected for saturation effects using SAGE data [16]. This data set comprises 6,297 protein-coding genes, with 6,117 genes remaining after filtering dubious open reading frames (classified by the Saccharomyces Genome Database; Additional data file 8). In the case of RA data, as in others described later, we also made several tests using other less refined data sets [19,22]. No major variations in the results obtained were found (not shown). For TR/TRi, the only experimental data set available was obtained using the Genomic Run-On methodology [19]. This data set comprised 5,828 genes (5,669 after filtering). For mRNA stability, several genomic calculations using either drug inhibition of RNA polymerase II or the rpb1-1 thermo-sensitive mutant and temperature shift were available. We used the overall RNA data set of [22] but other data sets [19,23] were tested and, again, no relevant differences were found. This data set comprised 4,677 genes (4,544 after filtering). For PA, we used the reference set constructed by Beyer et al. [17] using data from several sources. This set included 4,243 genes (4,239 after filtering). For TLRi calculation, we used ribosome density data [17] assuming a constant ribosome speed. To derive TLR values, we multiplied TLRi by the RA data described above. This data set comprised 6,154 genes (5,968 after filtering). Finally, for PS we used the recent set of 3,370 proteins (3,367 after filtering) [9]. The whole data set comprised 6,173 genes, for 3,991 of which there were data on at least 5 of the 6 variables considered (Additional data files 8 and 13). The quality of the different data sets was variable. RA data are quite robust because they were obtained by averaging results from many different sources and, moreover, they were normalized and corrected [17]. TR, RS and PS data were obtained from a single measurement; however, they were verified by comparison with previously determined individual data for some genes [9,19,22]. TLRi, and consequently TLR, data were obtained by averaging two data sets [17]. TLR data have the problem that they were calculated indirectly by multiplying experimentally determined data (the RA and TLRi data sets). This adds the mathematical error associated with these operations and the disadvantage that TLR and RA are not independent. PA data are the average of data obtained using very different techniques (epitope tagging, multidimensional protein identification technology, and two-dimensional electrophoresis [8,17,18]). In spite of this, PA data are less robust than RA data because they are based on fewer measurements and because the techniques used are less accurate than SAGE and DNA microarrays. For the analyses, we have used a z-score or a percentile normalization to avoid the high dispersion in the unit ranges among the different variables. In this way data retained their relative magnitude within each variable and were directly comparable across variables, thus reducing computation artifacts and enabling easier comparisons and interpretations. Cluster analyses We have used a range of statistical methods for identifying sets of genes with similar expression patterns. The two main approaches correspond to grouping or classifying genes according to their expression patterns and to represent them in a reduced dimension space. Characteristic global profiles were established by means of cluster analysis using the data set of z-score normalized values for the six variables (as mentioned in the Results section) for a total of 3,991 yeast genes for which data for at least 5 variables were available (Additional data file 8). For cluster analysis we used the SOTArray tool (included in the Gene Expression Pattern Analysis Suite v 3.0 (GEPAS) [26] from the worldwide web server of the CIPF Bioinformatics Unit) using the linear correlation coefficient among the six-variables vectors as distance between genes. The tree was allowed to grow until producing 20, 25 or 30 clusters. Alternative clustering methods were also applied to the same data set. We used k-means clustering [38] with a variable number of clusters from 2 to 25. In order to validate the quality of the previous clustering procedure, we used the Cluster Accuracy Analysis Tool (CAAT 1.0), also included in the GEPAS package. We calculated a 'silhouette width' for each internal node. This index represents how well each cluster is separated from its direct sister groups; that is, how close are items contained in this cluster (intracluster distance), and how far they are from the sister clusters (intercluster distance). Values for silhouettes range from -1.0 (very bad split) up to 1.0 (excellent split). Values near 0.0 indicate indifferent split. Cluster subdivision was stopped when the silhouette value was not improved in two consecutive divisions. Gene Ontology category searches To test the potential enrichment in GO categories in the different groupings obtained in this study (clusters from SOTA/CAAT trees, correlation groups, and so on), we used the FuncAssociate server [39], which uses a Monte Carlo simulation approach and accepts only significant GO categories according to their adjusted p value (computed from the fraction of 1,000 simulations under the null-hypothesis with the same or smaller p value and after correction for multiple simultaneous tests). Only GO categories with an adjusted p value below 0.05 were considered to be significant. Correlation analyses In order to test for genes having similar values for a given pair of variables, we ranked and ordered the values for each variable, and divided the distributions in quintiles (note that for each pair-wise comparison, the maximum number of gene pairs was considered; thus, the number of genes in each partition depended on the number of genes present in each comparison). Genes belonging to the upper quintile were numbered as 1, genes from the second quintile were numbered as 2, and so on, down to the lowest variable values, included in the quintile numbered as 5. When comparing two variables we classified genes into five correlation categories depending on their quintile difference. Thus, we established five correlation quality categories: 'very high', for genes having the same quintile value in both variables (five possible combinations); 'high', for genes differing in one quintile unit (eight possible combinations); 'medium', when the quintile difference was 2 (six combinations); 'low', for three unit differences (four combinations); and 'very low' for the cases of quintile differences of four units (two combinations). Searches for enrichment in specific GO categories were performed as described above. To test the global correlation between all pair-wise combinations of the six variables, Spearman rank correlation coefficients were calculated. Six variable profiles We also investigated whether different functionally related gene groups (MIPS complexes, GO categories, and the processosome complex as defined by Staub et al. [40] tended to have similar values in the six variables considered in this study. Thus, we used the rank (percentile) ordered values for the six variables for different related genes. We calculated the average rank value (percentile) and represented these values for the six variables ordered as RA, TR, RS, PA, TLR, PS, yielding a 'profile' for each group studied. We calculated also the standard error associated with each average and represented in the profile as error bars. These values were obtained by random sampling (1,000 replicates) among the genes having data for the six variables. Resampling group sizes were equal to that of genes in each considered group and subsequent computation of average and standard deviations for each variable. An estimation of the average standard deviations (aSE) for the six variables was calculated for each group (Additional data file 12). Comparison of mRNA and protein profiles For comparing mRNA and protein profiles, we used the quintile classification of genes as for correlation analyses (Figure (Figure2b2b Prevalence of flat patterns in mRNA and protein was studied separately by considering a flat pattern when the difference in quintile value among the most extreme variables for each molecule was less than three. Similarly, expected values were established by considering all the possible quintile combinations between the three variables for each molecule, and the statistical significance of the differences was assessed by means of a Chi-square test (Table 3). Test for transcriptional regulation In order to test for the transcriptional regulation level among the genes with a prevalence of TR over RS, we selected the genes for which that premise occurred (1,050 genes with TR > RS from Table 2). We represented in a 200-gene-wide sliding window the average fold-change in many stress conditions in the comprehensive study by Gasch et al. [14] versus the percentile difference between TR and RS (TR - RS). The statistical significance of the slope was assessed by means of a t-test. Abbreviations 6VP, six variable profile; CAAT, Cluster Accuracy Analysis Tool; CES, common expression strategy; GO, gene ontology; MIPS, Munich Information Center for Protein Sequences; PA, protein amount; PS, protein stability; RA, mRNA amount; RS, mRNA stability; SOTA, Self-organizing Tree Algorithm; TLR, translation rate; TLRi, individual translation rate; TR, transcription rate; TRi, individual transcription rate; YPD, yeast extract-peptone-dextrose culture medium. Authors' contributions JP-O conceived the original idea and designed the experiments. JG-M collected and curated the data sets and performed most of the analyses. FG-C performed some of the statistical analyses and supervised the computer methods. JP-O wrote most of the paper, and JG-M wrote the experimental section and FG-C corrected it. All three authors extensively discussed the results and their interpretation and approved the final version. Additional data files The following additional data are available with the online version of this paper. Additional data file 1 is a figure showing a plot of abundance and stability for mRNA and protein molecules. Additional data file 2 is a figure showing clustering similar to that shown in Figure Figure33 Additional data file 1 Plot of abundance and stability for mRNA and protein molecules. Click here for file(174K, pdf) Additional data file 2 Clustering similar to that shown in Figure Figure33 Click here for file(7.5K, pdf) Additional data file 3 Clustering similar to that shown in Figure Figure33 Click here for file(9.6K, pdf) Additional data file 4 Further analysis of cluster 3 from Figure Figure33 Click here for file(12K, pdf) Additional data file 5 Further analysis of cluster 7 from Figure Figure33 Click here for file(18K, pdf) Additional data file 6 Further analysis of cluster 11 from Figure Figure33 Click here for file(16K, pdf) Additional data file 7 6VP for some other functional categories not shown in Figure Figure44 Click here for file(8.0K, pdf) Additional data file 9 Ribosome biogenesis genes that appear within the low correlation class in Figure Figure2b2b Click here for file(6.0K, pdf) Additional data file 10 Complete list of significant GO categories found in clusters from Figure Figure33 Click here for file(40K, pdf) Additional data file 11 Complete list of significant GO categories found in clusters from Additional data files 2 and 3 and not present in Figure Figure33 Click here for file(32K, pdf) Additional data file 12 Standard error averages calculated for experimental (aSEe) and random sampling (aSEr) estimations for the functionally related groups from Figure Figure44 Click here for file(8.3K, pdf) Additional data file 13 Values for the six variables of the 6,173 genes analyzed. Click here for file(3.2M, xls) Additional data file 14 Description and comments of the data shown in additional data file 1. Click here for file(12K, pdf) Acknowledgements We are grateful to Drs Enrique Herrero, Albert Sorribas and Vicente Tordera for critical reading of the manuscript, to all the members of the lab for helpful comments and discussion and to Drs J Dopazo, J Huerta and F Al-Shahrour for helping with the GEPAS software package. This work was supported by research grants from the Ministerio de Educación y Ciencia (GEN2001-4707-C08-07, BMC2003-07072-C03-02 and BFU2006-15446-C03-02) to JP-O. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||
Symp Soc Exp Biol. 1958; 12():138-63.
[Symp Soc Exp Biol. 1958]Trends Genet. 2007 May; 23(5):250-7.
[Trends Genet. 2007]PLoS Biol. 2004 Jun; 2(6):e137.
[PLoS Biol. 2004]Nat Genet. 2006 Jun; 38(6):636-43.
[Nat Genet. 2006]Nature. 2006 Jun 15; 441(7095):840-6.
[Nature. 2006]Cold Spring Harb Symp Quant Biol. 2001; 66():567-74.
[Cold Spring Harb Symp Quant Biol. 2001]Mol Cell. 2007 Apr 13; 26(1):145-55.
[Mol Cell. 2007]Cell. 1997 Jan 24; 88(2):243-51.
[Cell. 1997]Mol Cell Proteomics. 2004 Nov; 3(11):1083-92.
[Mol Cell Proteomics. 2004]Nature. 2006 Jun 15; 441(7095):840-6.
[Nature. 2006]Genome Biol. 2003; 4(9):117.
[Genome Biol. 2003]Genome Biol. 2003; 4(9):117.
[Genome Biol. 2003]Proc Natl Acad Sci U S A. 2006 Aug 29; 103(35):13004-9.
[Proc Natl Acad Sci U S A. 2006]Mol Cell Proteomics. 2004 Nov; 3(11):1083-92.
[Mol Cell Proteomics. 2004]Proc Natl Acad Sci U S A. 2003 Apr 1; 100(7):3889-94.
[Proc Natl Acad Sci U S A. 2003]Proc Natl Acad Sci U S A. 2002 Apr 30; 99(9):5860-5.
[Proc Natl Acad Sci U S A. 2002]Genome Biol. 2003; 4(9):117.
[Genome Biol. 2003]Nat Biotechnol. 2007 Jan; 25(1):117-24.
[Nat Biotechnol. 2007]Mol Cell Proteomics. 2004 Nov; 3(11):1083-92.
[Mol Cell Proteomics. 2004]Mol Cell. 2004 Jul 23; 15(2):303-13.
[Mol Cell. 2004]Bioinformatics. 2001 Feb; 17(2):126-36.
[Bioinformatics. 2001]Nucleic Acids Res. 2005 Jul 1; 33(Web Server issue):W616-20.
[Nucleic Acids Res. 2005]Nucleic Acids Res. 2005 Jul 1; 33(Web Server issue):W616-20.
[Nucleic Acids Res. 2005]Nature. 2006 Jun 15; 441(7095):840-6.
[Nature. 2006]Mol Biol Cell. 2000 Dec; 11(12):4241-57.
[Mol Biol Cell. 2000]Trends Cell Biol. 2003 Jul; 13(7):344-56.
[Trends Cell Biol. 2003]Mol Cell. 2004 Jul 23; 15(2):303-13.
[Mol Cell. 2004]Genome Biol. 2003; 4(9):117.
[Genome Biol. 2003]Proc Natl Acad Sci U S A. 2006 Aug 29; 103(35):13004-9.
[Proc Natl Acad Sci U S A. 2006]Mol Cell Proteomics. 2004 Nov; 3(11):1083-92.
[Mol Cell Proteomics. 2004]J Biol Chem. 2002 Apr 26; 277(17):14363-6.
[J Biol Chem. 2002]Genome Biol. 2003; 4(9):117.
[Genome Biol. 2003]Curr Opin Microbiol. 2004 Oct; 7(5):535-45.
[Curr Opin Microbiol. 2004]Nature. 2006 Jun 15; 441(7095):840-6.
[Nature. 2006]Genome Biol. 2003; 4(9):117.
[Genome Biol. 2003]Proc Natl Acad Sci U S A. 2006 Aug 29; 103(35):13004-9.
[Proc Natl Acad Sci U S A. 2006]Nat Biotechnol. 2007 Jan; 25(1):117-24.
[Nat Biotechnol. 2007]Bioinformatics. 2002 Apr; 18(4):585-96.
[Bioinformatics. 2002]Proc Natl Acad Sci U S A. 2002 Apr 30; 99(9):5860-5.
[Proc Natl Acad Sci U S A. 2002]Proc Natl Acad Sci U S A. 2002 Jul 23; 99(15):9697-702.
[Proc Natl Acad Sci U S A. 2002]Genome Biol. 2006; 7(10):R99.
[Genome Biol. 2006]Mol Cell. 2007 Apr 13; 26(1):145-55.
[Mol Cell. 2007]Bioinformatics. 2001 Feb; 17(2):126-36.
[Bioinformatics. 2001]Nucleic Acids Res. 2005 Jul 1; 33(Web Server issue):W616-20.
[Nucleic Acids Res. 2005]Genome Res. 2002 Oct; 12(10):1574-81.
[Genome Res. 2002]Genome Res. 2002 Jan; 12(1):37-46.
[Genome Res. 2002]Genome Res. 2003 Nov; 13(11):2450-4.
[Genome Res. 2003]PLoS Biol. 2004 Jun; 2(6):e137.
[PLoS Biol. 2004]Mol Cell. 2004 Jul 23; 15(2):303-13.
[Mol Cell. 2004]Mol Cell Biol. 2004 Jun; 24(12):5534-47.
[Mol Cell Biol. 2004]Mol Biol Cell. 2000 Dec; 11(12):4241-57.
[Mol Biol Cell. 2000]Curr Top Cell Regul. 1975; 9():183-236.
[Curr Top Cell Regul. 1975]Mol Cell Proteomics. 2004 Nov; 3(11):1083-92.
[Mol Cell Proteomics. 2004]Nat Biotechnol. 2007 Jan; 25(1):117-24.
[Nat Biotechnol. 2007]Cold Spring Harb Symp Quant Biol. 2001; 66():567-74.
[Cold Spring Harb Symp Quant Biol. 2001]Genes Dev. 1988 Jun; 2(6):664-76.
[Genes Dev. 1988]Mol Cell Biol. 2006 Mar; 26(5):1731-42.
[Mol Cell Biol. 2006]Mol Cell. 2007 Apr 13; 26(1):145-55.
[Mol Cell. 2007]Cell. 2004 Jul 9; 118(1):31-44.
[Cell. 2004]Mol Cell. 2007 Apr 13; 26(1):145-55.
[Mol Cell. 2007]Mol Cell Proteomics. 2004 Nov; 3(11):1083-92.
[Mol Cell Proteomics. 2004]Cell. 1997 Jan 24; 88(2):243-51.
[Cell. 1997]Mol Cell. 2004 Jul 23; 15(2):303-13.
[Mol Cell. 2004]Proc Natl Acad Sci U S A. 2002 Apr 30; 99(9):5860-5.
[Proc Natl Acad Sci U S A. 2002]Mol Cell Biol. 2004 Jun; 24(12):5534-47.
[Mol Cell Biol. 2004]Mol Cell Proteomics. 2004 Nov; 3(11):1083-92.
[Mol Cell Proteomics. 2004]Proc Natl Acad Sci U S A. 2006 Aug 29; 103(35):13004-9.
[Proc Natl Acad Sci U S A. 2006]Mol Cell. 2004 Jul 23; 15(2):303-13.
[Mol Cell. 2004]Proc Natl Acad Sci U S A. 2002 Apr 30; 99(9):5860-5.
[Proc Natl Acad Sci U S A. 2002]Genome Biol. 2003; 4(9):117.
[Genome Biol. 2003]Nucleic Acids Res. 2005 Jul 1; 33(Web Server issue):W616-20.
[Nucleic Acids Res. 2005]Bioinformatics. 2003 Dec 12; 19(18):2502-4.
[Bioinformatics. 2003]Genome Biol. 2006; 7(10):R98.
[Genome Biol. 2006]Mol Biol Cell. 2000 Dec; 11(12):4241-57.
[Mol Biol Cell. 2000]