![]() | ![]() |
Formats:
|
||||||||||||||||||
Copyright © 2008, EMBO and Nature Publishing Group On the relation between promoter divergence and gene expression evolution 1Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel 2Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel aDepartment of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel. Tel.: +972 8 934 4429; Fax: +972 8 934 4108; Email: naama.barkai/at/weizmann.ac.il Received August 31, 2007; Accepted November 6, 2007. This is an open-access article distributed under the terms of the Creative Commons Attribution Licence, which permits distribution and reproduction in any medium, provided the original author and source are credited. Creation of derivative works is permitted but the resulting work may be distributed only under the same or similar licence to this one. This licence does not permit commercial exploitation without specific permission. This article has been cited by other articles in PMC.Abstract Recent studies have characterized significant differences in the cis-regulatory sequences of related organisms, but the impact of these differences on gene expression remains largely unexplored. Here, we show that most previously identified differences in transcription factor (TF)-binding sequences of yeasts and mammals have no detectable effect on gene expression, suggesting that compensatory mechanisms allow promoters to rapidly evolve while maintaining a stabilized expression pattern. To examine the impact of changes in cis-regulatory elements in a more controlled setting, we compared the genes induced during mating of three yeast species. This response is governed by a single TF (STE12), and variations in its predicted binding sites can indeed account for about half of the observed expression differences. The remaining unexplained differences are correlated with the increased divergence of the sequences that flank the binding sites and an apparent modulation of chromatin structure. Our analysis emphasizes the flexibility of promoter structure, and highlights the interplay between specific binding sites and general chromatin structure in the control of gene expression. Keywords: chromatin structure, evolution, gene expression Introduction The unique phenotype of each organism is defined by a combination of its gene content and the regulation of these genes. Evolution of protein sequence and its contribution to phenotypic adaptation has been studied extensively (Pal et al, 2006), while most of what we know about regulatory evolution comes from the study of individual genes (Wray, 2007). Regulatory evolution reflects changes in genomic sequences that influence (either directly or indirectly) gene expression. Among the multiple mechanisms that control gene expression, the binding of transcription factors (TFs) to sequence-specific binding sites within the upstream promoters is arguably the best-characterized regulatory scheme. The short lengths of TF-binding sites, and their sensitivity to even a small number of mutations, make them ideal candidates for driving gene expression divergence (ED) in cis. Recent studies have begun to compare the promoters of related organisms in search for such regulatory differences. These analyses are hindered by the difficulty to distinguish differences in functional (e.g. TF-binding sites) from non-functional promoter elements. Nonetheless, leveraging on prior knowledge of TF-binding motifs, several studies predicted the gain and loss of thousands of TF-binding sites both in yeast and mammalian species (Donaldson and Gottgens, 2006; Doniger and Fay, 2007). Furthermore, two very recent studies have used chromatin immunoprecipitation to directly identify differences in TF binding among related species (Borneman et al, 2007; Odom et al, 2007). While these studies rely on the premise that observed differences in TF-binding sequences represent gene ED and ultimately phenotypic evolution, this was not directly examined. In fact, various evidences have indicated that extensive promoter divergence, including differences in TF-binding sequences, may evolve neutrally, with no influence on gene expression. First, several studies found that promoters from different species which have extensively diverged still drive the same reporter expression patterns (Ludwig et al, 1998; Takahashi et al, 1999; Romano and Wray, 2003; Ruvinsky and Ruvkun, 2003; Oda-Ishii et al, 2005; Fisher et al, 2006; Wang et al, 2007). Second, it was shown that changes in TF-binding sequences are poorly correlated with divergence of gene expression among yeast paralogs (Zhang et al, 2004). Third, Doniger and Fay (2007) introduced mutations in binding sequences that were found in Saccharomyces paradoxus and S. mikatae to the orthologous promoters in S. cerevisiae and examined their effect on gene expression using reporter assays. The expected effects on gene expression were found only in 3 out of the 11 cases that were examined. Finally, a comprehensive analysis of ~1% of the human genome (The ENCODE Project Consortium, 2007) has shown that a large percentage of the functional regulatory elements are not conserved among mammals, suggesting the neutral evolution of these sequences. Thus, it appears that promoters are highly flexible, and are capable of maintaining stabilized gene expression pattern through many realizations of sequences, even when binding motifs are concerned. Here, we examine the impact of changes in TF-binding sequences on the associated gene expression on a genome-wide scale using comparative expression data sets of related organisms (Ranz et al, 2003; Rifkin et al, 2003; Su et al, 2004; Khaitovich et al, 2005; Gilad et al, 2006; Tirosh et al, 2006) combined with comparative data sets of predicted TF-binding sites. We find that most predicted changes in TF-binding sequences, in both yeast and mammals, have only little effect on gene expression. To examine the connection between changes in TF-binding sequences and ED in a more controlled setting, we measured the transcription response of three closely related yeast species to mating pheromone. Analysis of this response allows us to assess the relative contribution of specific cis-regulatory elements and of general promoter structure to the divergence of gene expression. Results Reported changes in TF-binding sequences have only little detectable impact on gene expression A recent study has characterized thousands of matches to binding site motifs that were conserved in the promoters of both chimpanzee and mouse (and are thus likely to be functional in both organisms) but are mutated, and do not match the binding site motif, in human (Donaldson and Gottgens, 2006). To examine the impact of these mutations on gene expression, we assembled two data sets comparing gene expression between human and either chimpanzee or mouse. The first data set compares the expression levels of human and chimpanzee genes across five tissues (Khaitovich et al, 2005) and the second data set compares the expression patterns of human and mouse genes across 30 orthologous tissues (Su et al, 2004). In both cases, we found that the ED of genes with diverged sequence motifs in their proximal promoters (1 kb) was indistinguishable from the ED of genes with conserved motifs (Supplementary Figure 1). Analysis of mammalian regulatory divergence is hindered by several limitations, including the inherent complexity of mammalian promoters, the presence of multiple (often far away) enhancers, and the poor knowledge of TFs and their binding specificities. To try and circumvent these problems, we turned to yeast, whose promoters are significantly shorter (~600 bp) and well defined. Doniger and Fay (2007) have recently analyzed the conservation of sequence motifs in promoters of closely related yeast species and predicted the loss of TF-binding sites. We examined the impact of predicted changes in TF-binding sites on ED using a comparative data set that we have recently reported, where we examined the genome-wide expression programs of the same species to several environmental stresses (Tirosh et al, 2006). Also here, genes predicted to loose TF-binding sites (i.e. whose promoters contained diverged sequence motifs) had the same level of ED as genes with conserved sequence motifs (Figure 1A
One possible explanation for the lack of correlation between the divergence of sequence motifs and that of gene expression is that the reported sequence variations do not affect the binding of the respective TFs to promoters. To explore this possibility, we examined the binding of multiple TFs, with conserved or diverged motifs, to S. cerevisiae promoters (Harbison et al, 2004) (Figure 1C Since divergence of sequence motifs corresponds only partially to divergence of TF binding, interspecies differences in TF binding should be experimentally determined. The binding of four TFs (FOXA2, HNF1A, HNF4A and HNF6) to 4000 orthologous gene pairs in human and mouse liver cells was recently analyzed by chromatin immunoprecipitation (Odom et al, 2007) and extensive differences were identified between the binding of these TFs to orthologous genes. To examine the impact of these differences, we compared the expression levels of human and mouse liver cells (Xing et al, 2007). However, we found no connection between differences in TF binding and ED (Figure 1D Genome-wide analysis of the mating response in three yeast species The difficulty to predict expression changes from analysis of sequence motifs could stem from the coordinated activity of multiple TFs that affect gene expression through combinatorial regulation. The binding of multiple TFs to multiple promoter elements may conceal the influence of specific differences, but at the same time provide more raw material for regulatory changes and is, in general, correlated with an increased ED (Tirosh et al, 2006; Landry et al, 2007). We thus sought to analyze a simpler situation where gene expression is defined primarily by a single TF. The yeast mating response appears suitable. This response is relatively isolated and is induced by the activation of the STE12 TF whose consensus binding motif, as well as target promoters, is well defined (Roberts et al, 2000; Zeitlinger et al, 2003; Bardwell, 2005). Mating in yeast occurs when two haploid cells of the opposite mating types (a and α) fuse and form a single diploid cell. Each cell secrets a unique pheromone (a-factor or α-factor, respectively) that is sensed by the other cell, triggering it to initiate the mating response. Mating involves extensive changes in the gene expression program (Roberts et al, 2000), with more than 200 upregulated genes and additional genes downregulated. Many of the upregulated genes have specific roles in mating and are regulated by STE12, while the downregulated ones are associated with cell-cycle arrest. The α-factor peptide secreted by S. cerevisiae has been isolated and synthesized in vitro. Currently, most studies of the mating response are performed by subjecting haploid yeast cells of the a-type to this synthetic α-factor (e.g. Roberts et al, 2000). Since the gene coding for the α-factor peptide is highly conserved within the sensu–stricto complex, we used the synthetic α-factor from S. cerevisiae to elicit the mating response in three closely related species: S. cerevisiae, S. paradoxus and S. mikatae. As expected, all species responded to this α-factor by arresting their cell cycle and growing a visible shmoo (Supplementary Figure 3). We measured the gene expression response to α-factor using microarrays containing complete-ORF probes for the ~6000 S. cerevisiae genes. To control for technical variations, we performed biological repeats (three in S. cerevisiae and S. paradoxus and four in S. mikatae), with all experiments (for all species and repeats) executed in parallel. The sequence of S. paradoxus and S. mikatae genes is highly similar to those of S. cerevisiae (~90 and ~85% on average, respectively), and accordingly produced significant and reproducible hybridization. Notably, while absolute hybridization intensities are affected by sequence mismatches, our analysis is based solely on the ratios of hybridization intensities in samples taken with and without pheromone. Indeed, this cross-species hybridization platform was validated in both yeast and other organisms by us and others (Sartor et al, 2006; Tirosh et al, 2006; Oshlack et al, 2007). In each of the three species, α-factor induced significant (P<0.05) expression changes of more than 1000 genes. As expected, about 100 genes were upregulated by at least two-fold, and these genes were enriched with previously known mating-related genes (P<10−5 in each species). The response of each species was highly reproducible, with a genome-wide correlation of ~0.9 among biological repeats (Figure 2
We identified 408 genes that are differentially expressed between at least one pair of yeast species (see Materials and methods and Supplementary Table 1). Interestingly, these diverged genes had high ED also in the stress-related comparative data (Tirosh et al, 2006) (P=5 × 10−26), and were enriched with TATA-containing genes (35% compared with 22%, P=1.5 × 10−8). This suggests that genes vary in their tendency for ED, such that the same genes diverge in expression at different processes. Furthermore, this tendency may be related to the presence of TATA boxes, as we and others have previously suggested (Tirosh et al, 2006; Landry et al, 2007). In contrast, these diverged genes do not have higher then average protein sequence divergence (Wall et al, 2005) (P=0.46), suggesting a decoupling between evolution of protein sequence and expression. To better understand the pattern of differential expression, we classified the differentially expressed genes into 12 classes and separately analyzed the genes within each class (Figure 3
We analyzed the enrichment of gene ontology (GO) annotations and the presence of mating-related genes within each class (Figure 3 TF-binding sequences versus ED in the mating response The mating response is orchestrated by a single TF, STE12, whose binding site motif (TGAAACA) is well characterized. This binding consensus, as well as the coding sequence of STE12, and specifically its binding domain are well conserved between the species analyzed (Supplementary Figure 5). As mentioned above, these properties enable a more focused analysis of evolution of TF-binding sequences that minimizes trans-acting and combinatorial effects. We thus searched for genes with a conserved STE12 sequence motif that has diverged in at least one of the three yeast species for which expression was measured (Materials and methods). This analysis identified 64 diverged sequence motifs. In each species, we classified diverged STE12 sequence motifs as a ‘conserved motif' if the sequence motif was conserved in that species but lost in another species, and ‘lost motif' if the sequence motif was conserved in two other species but lost in that species. Divergence of STE12 sequence motifs was highly correlated with ED in the mating response. In each of the three species, genes with ‘conserved motif' were significantly more likely to be upregulated in response to α-factor than genes with ‘lost motif' (Figure 4A
A significant number of exceptions were still observed, however (Figure 4C We next asked how much of the observed interspecies differential expression can be accounted for by differences in STE12 sequence motifs (Figure 5
While the mating response is predominately activated by STE12, other TFs may also impact the observed gene expression. We thus examined whether divergence of sequence motifs for other TFs is correlated with expression differences in the mating response. Divergence of sequence motifs for seven additional TFs were found to be significantly (P<0.05) associated with differential upregulation (MBP1, TEC1, SKO1, RDS1, SWI6, HAP2 and HSP1). For example, out of 16 pairs of orthologs which are differentially upregulated and also have a diverged MBP1 sequence motif, divergence of the motif is correlated with the loss of upregulation in 15 gene pairs (P<0.001), consistent with the predicted cooperative binding of MBP1 and STE12 (Banerjee and Zhang, 2003; Das et al, 2004). Conversely, loss of TEC1 sequence motifs is correlated with the gain of upregulation, consistent with the role of TEC1 in shifting the binding of STE12 from mating-related genes to pseudohyphal-related genes (Zeitlinger et al, 2003). Moreover, in four of these genes, the differential appearance of the TEC1 motif in S. cerevisiae and S. mikatae is also consistent with their differential binding by TEC1 in pseudohyphal conditions (Borneman et al, 2007), while none show the opposite pattern. Altogether, divergence of sequence motifs for STE12 and these seven additional TFs, could account for 32% of the observed differential upregulation and up to 49% of the differential upregulation of genes with an STE12 sequence motif (Figure 5 Flanking promoter sequences and their impact on chromatin structure Taken together, changes in TF sequence motifs can account for approximately half of the ED of STE12-dependent genes. To understand the genetic basis of the remaining expression differences, we examined the divergence of promoter sequences that flank the STE12 sequence motifs (Figure 6A
Flanking sequences could influence gene expression if they contain unrecognized sites that are bound by either STE12 or by related TFs. However, we find no enrichment of matches (or weak matches) to TF sequence motifs at these regions. Alternatively, these sequences could influence gene expression through other mechanisms, most notably by modulating chromatin structure. To examine this possibility, we applied a computational model to predict the nucleosome occupancy in promoters of the different yeast species (Segal et al, 2006). We found that divergence of the sequences flanking STE12 sequence motifs indeed affects the predicted nucleosome occupancy at the location of the sequence motifs, and that this effect is larger among genes with unexplained differential upregulation compared with other genes (Figure 6B For most genes, the predicted differences in nucleosome occupancy are quite small and do not appear to influence gene expression. However, among the three genes with the largest predicted changes in nucleosome occupancy (differences larger than 0.5 at the conserved STE12 sequence motifs), these changes were exactly consistent with the observed differential upregulation (Figure 6C–E Discussion Our goal in this study was to examine systematically the impact of changes in cis-regulatory elements on the divergence of gene expression. To this end, we first analyzed existing data comparing the gene expression profiles of related species for which promoter analyses predicted the gain and loss of TF-binding sites. Both in mammalian and in yeast data sets, we observed only a poor correlation between the divergence of sequence motifs and gene expression. Furthermore, analysis of experimentally mapped differences in human and mouse TF binding led to similar conclusions. Although surprising, these results are consistent with various previous studies, as described above. Several reasons may explain the stabilized expression we and others have observed despite the divergence of TF sequence motifs. First, the identification of diverged motifs may not be optimal and may not correspond to divergence in TF binding. We tried to overcome this difficulty by using different data sets and restricting our analysis to the most significant cases of sequence divergence. Second, it might be that divergence of sequence motifs does impact gene expression, but only under conditions which do not appear in our gene expression data sets. However, also for the stress-related TFs in yeast, divergence of sequence motifs could not predict divergence of gene expression during environmental stress. Third, although we tried to control for binding sites turnover, a compensatory motif may have appeared in more distant regions not considered in our analysis. Again, this is an important limitation concerning the mammalian system where transcription is controlled by multiple enhancers, but is less of a problem in the yeast system where the relevant controlling region is believed to be rather limited. Finally, gene expression might be compensated through other mechanisms, such as chromatin structure or interactions among TFs. For example, loss of binding sites may be buffered by the presence of other binding sites for additional factors that are involved in the same process or that bind cooperatively with the factor whose binding site has diverged. Indeed, both the mammalian tissue-specific expression and the response of yeast to stressful conditions involve a coordinated function of multiple TFs. Thus, the observed variability in TF-binding sequences among related species may reflect the complexity of the underlying regulatory network and the possibility to encode the same expression pattern with many different promoter designs perhaps even more than it reflects the actual divergence of gene expression. To overcome the above limitations, we extended the analysis to a more controlled setting and generated a new comparative data set describing the transcriptional response of three closely related yeast species to mating pheromone. This response is predominantly orchestrated by a single TF, STE12, whose binding specificity is well characterized. While the response was largely conserved, the expression of ~400 genes significantly differed between the species. In this case, divergence of STE12 sequence motifs was correlated with the changes in gene expression, and together with other TF sequence motifs could account for approximately one-third of the observed differential upregulation and up to half of that of STE12-regulated genes. Importantly, predictions based on sequence analysis are only partially consistent with the actual binding of TFs. In fact, a number of studies over the past few years indicate that binding does occur at highly diverged and non-consensus sites, reflecting additional regulatory aspects, which are still poorly understood. For example, approximately half of the genes that are bound by STE12 in S. cerevisiae contain an exact match to the STE12-binding motif, and only one-third of the genes with conserved matches to the STE12-binding motif are bound by STE12 in S. cerevisiae (Harbison et al, 2004). Accordingly, the impact of binding site divergence could be actually larger than that inferred from our analysis of binding site sequences alone. Chromatin structure has a central role in controlling the accessibility of promoters to TFs, and the mechanisms by which it is regulated are widely studied (Li et al, 2007). The influence of DNA sequence on nucleosome positions has been extensively analyzed and several models have been developed to predict nucleosome positioning based on DNA sequences alone (Ioshikhes et al, 2006; Segal et al, 2006). We applied the model developed by Segal et al to examine the effect of sequence divergence in regions surrounding STE12-binding sites and found that sequence divergence may indeed affect nucleosome positions and that, at least in certain cases, this could account for the observed differential expression. Similarly, we found that some of the interspecies differences in the binding of TEC1 and STE12 at pseudohyphal conditions are correlated with changes in nucleosome occupancy. Thus, the location of nucleosomes can evolve among closely related species and influence gene ED. Notably, our analysis may capture only a small proportion of the actual influence of chromatin structure, as it is based on computational predictions of the intrinsic preference of nucleosomes for certain DNA sequences, but ignores additional regulatory mechanisms and particularly the dynamic nature of chromatin structure. It would be interesting, although significantly more complicated, to directly measure the location of nucleosomes among multiple species at similar conditions, such as during mating, and examine their influence on expression differences. Our results provide a detailed account of the impact of promoter sequence variations on gene expression within the mating system. Such a detailed analysis is enabled by the relative simplicity of this system and would be much more difficult in typical regulatory systems, such as the yeast stress response or the mammalian liver. Similar analysis of additional regulatory systems may indicate the generality of the results shown here and begin to uncover the relative importance of different mechanisms in generating interspecies differences in gene expression. Finally, we note that while our study focused on the genetic basis of gene expression evolution, a complementary challenge is to describe the contribution of such changes to phenotypic diversity. In this aspect, the data set generated here might be beneficial, as it provides insights into the modulation of the mating response. First, we have found that several mating-related genes are not upregulated in S. mikatae as in S. cerevisiae and S. paradoxus (Figure 3F Materials and methods Prediction of changes in human TF-binding sequences We downloaded the mammalian promoters multiple alignments from the UCSC genome browser. We focused on proximal promoters (1 kb) of human, chimpanzee, mouse and rat and searched them for the presence of exact matches to the mammalian promoter motifs defined by Xie et al (2005). This initial search was performed after excluding the alignment gaps. We restricted this analysis to the promoter motifs with the highest confidence: those which matched the binding sites of known TFs and the 30 promoter motifs with the highest conservation significance. We searched for promoters that contain an exact match to a specific motif in several species but not in another. Because the human sequences are more accurate than those of the other species, and because our expression data sets compare human with either chimpanzee or mouse, we focused only on matches that are conserved in chimpanzee and mouse (or rat) but mutated in human. Thus, we demanded that the chimpanzee and mouse promoters contain exact matches to the motif and that these matches are aligned (at the same position in the multiple alignments). The entire human proximal promoter was required to have no exact matches. The number of functional mutations was estimated by the number of differences from the motif in the sequence, which was aligned to the matches in the other species. This analysis resulted in prediction of 695 human promoters with at least two mutations in a TF-binding site, which is conserved in chimpanzee and mouse (or rat). Prediction of changes in yeast TF-binding sequences Promoters (600 bp) of S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii and S. bayanus were downloaded from SGD and aligned by clustelw. We searched these promoters for the presence of TF sequence motifs as defined by MacIsaac et al (2006). For each TF, we calculated the position weight matrix scores across all promoters and defined binding sites as matches with scores greater than a predefined threshold (t1). Diverged promoters were defined as those with a conserved and aligned match exceeding t1 in at least three of the five species, but no match in another species that exceeds a second threshold (t2). Thus, promoters with multiple binding sequences for the same TF would be recognized as diverged only if all these binding sequences have diverged in one of the species. The thresholds (t1 and t2) were chosen based on their specificity in predicting the binding of TFs (Harbison et al, 2004); the specificity of a given threshold is the fraction of bound promoters among those with a conserved match exceeding that threshold. We chose t1 as the minimal threshold with specificity greater than 0.5, and t2 as the maximal threshold with specificity lower than 0.1. Other thresholds and parameters were also examined and led to similar conclusions (see Supplementary Figure 2). Expression divergence Yeast ED was calculated from the response of four yeast species to various environmental stresses (Tirosh et al, 2006). Tissue-specific ED between human and chimpanzee was taken from Khaitovich et al (2005), log2-transformed, centered and normalized by its s.d. Expression patterns of human and mouse across 30 orthologous tissues were taken from Su et al (2004); ED was calculated as in Liao and Zhang (2006), centered and normalized by its s.d. ED between human and mouse liver cells was calculated as the log2 of the absolute difference in log-transformed expression levels, as measured with exon arrays (Xing et al, 2007). Diverged genes were defined in each data set as those with ED higher than mean+s.d. Analysis of mating and response to α-factor in three yeast species Haploids isolation S. paradoxus (hoΔ KANr) and S. mikatae (hoΔ KANr) cells were grown in liquid YPD to saturation, then transferred to sporulation plates (1% K-acetate, 0.1% yeast extract, 0.05% glucose and 2% agar) and incubated at 25°C for 6–7 days. Haploid cells were isolated by random spore analysis: Asci were collected from the sporulation plates, resuspended in buffer A (10 mM DTT, 100 mM Tris-SO4 pH 9.4) and incubated for 1 h at 30°C with gentle agitation. Next, the asci were resuspended in buffer B (1.1 M Sorbito, 10 mM K-phosphate buffer pH 7.2), treated with yeast lytic enzyme (ICN) (0.5 mg/5 × 108 cells) for 1 h at 30°C and transferred to 0.5% Triton X-100. Asci were disrupted by sonication (three rounds of 30 s) and then spread on YPD plates. Haploids mating type was determined using PCR (Huxley et al, 1990).α-Factor treatment S. cerevisiae, S. paradoxus and S. mikatea a-type cells were grown to log phase in YPD at 30°C. Cells (3 × 106/ml) were treated with 3 μM α-factor for 90 min. Samples were collected at the indicated time points. Cell counting was performed in a standard hemocytometer. Microarray analysis α-Factor experiments: S. cerevisiae, S. paradoxus and S. mikatae a-type cells (5 × 106/ml of each) were grown on YPD at 30°C to early log phase. Cells were treated with 6 μM α-factor for 60 min. As reference we used the same cells before addition of α-factor. Three to four biological repeats (different colonies) were processed in parallel for each species. Natural mating experiments: S. cerevisiae a- and α-type cells (5 × 106 of each) were mixed and applied on to 0.22-μm filter. The filters were placed on YPD agar plates at 30°C for 90 min and then snap-frozen with liquid nitrogen. For reference, S. cerevisiae a- and α-type cells were placed on separate filters and incubated on YPD agar plates at 30°C for 90 min. The cells were mixed only prior to mRNA extraction. Sample labeling, hybridization and scanning: cDNA of experiment and reference samples was synthesized from total RNA using M-MLV Reverse Transcriptase RNase H Minus (Promega) and labeled with Cy3 and Cy5, respectively, by the indirect amino-allyl method. Hybridization and scanning were carried out as described in Tirosh et al (2006). Expression data were log2-transformed and normalized by intensity-dependent correction (subtracting a lowess regression) and by position-dependent correction (subtracting the median of each subarray from all the genes in it). To account for differences in the overall signal in different arrays, we calculated the linear regression between the first S. cerevisiae array and each subsequent array, and divided the log2 expression ratios of that array by the regression coefficient. Raw and normalized expression data are available at the GEO (Gene Expression Omnibus, http://www.ncbi.nlm.nih.gov/geo/) database with the series accession GSE7525. Differential expression in the mating response To identify genes that are differentially expressed among the three species, we performed a two-sampled t-test for each gene and for each pairwise species comparison. A false discovery rate of 0.05 was used to generate a list of differentially expressed genes. We noted that many of the genes identified by the t-test as differentially expressed are highly regulated and display the same general trend in all species (up- or downregulated). These differences may arise from technical artifacts such as inflated log ratios due to low signals and different sensitivity of the different arrays. Furthermore, these differences are less biologically meaningful than differences between up/downregulation in one species and no regulation (or opposite regulation) in another species. We thus employed another test for differential expression, which takes into account not only the difference in log ratios but also the magnitude of overall up- or downregulation. We used the formula from Tirosh et al (2006) to estimate the ED of each gene, which can be rewritten as
where EDi,j(g) is the expression divergence of gene g between species i and j, and xi(g) is the normalized log2 expression ratio of gene g in species i. The 20% genes with the highest ED were considered to be differentially expressed. These two tests agreed on the differential expression of 408 genes, which we further analyze. We classified the differentially expressed genes into 12 patterns of differential expression. Each pattern corresponds to higher expression in a subset of the species compared with the remaining species. We first defined characteristic expression vectors for six patterns with values of 1 in a subset of the species and −1 in the remaining species, and classified genes according to the vector with which their expression pattern had the highest correlation. Next, we divided each class into two subclasses. The first subclass corresponds to genes that can be estimated as upregulated in one subset of the species and not regulated in the other species, and the second corresponds to genes that are better estimated as downregulated in one subset of the species and not regulated in the other species. This was done by comparing the absolute values of the highest and lowest response of those genes; if the maximum magnitude of upregulation in any of the species was higher than the maximum magnitude of downregulation in any of the species, then we classified the gene as upregulated, and in the opposite case as downregulated. Each class (and subclass) was analyzed by searching for enrichment of GO annotations and presence of mating-annotated genes. Impact of changes in TF sequence motifs on mating ED For each pair of orthologs that are differentially upregulated, we examined whether there are non-conserved matches to TF sequence motifs (MacIsaac et al, 2006). Divergence in sequence motifs for STE12 and five other TFs were significantly (P<0.05) associated with loss of upregulation and divergence in sequence motifs for two other TFs were significantly associated with gain of upregulation. Thus, the differential upregulation of each pair of orthologs with diverged matches to any of these eight motifs in which the difference in expression was consistent with the difference in the sequence motif (e.g. the gene with the STE12-binding site is upregulated and the gene without this binding site is not upregulated) was considered to be accounted by changes in TF sequence motifs. Nucleosome occupancy Nucleosome occupancy was estimated across the promoters of the three yeast species using the working version of the yeast model from the Segal lab (http://genie.weizmann.ac.il/pubs/nucleosomes06/index.html). We examined the differences in predicted nucleosome occupancy at the positions of STE12-binding sites. Few promoters had large differences (0.5 or higher) and the predicted changes in nucleosome occupancy among these promoters were correlated with differential upregulation. Supplementary Figures Click here to view.(146K, pdf) Supplementary Table 1 Click here to view.(62K, xls) Acknowledgments This study was supported by the Helen and Martin Kimmel Award for Innovative Investigation and grants from the Pasteur–Weizmann joint research program, the Kahn fund for Systems Biology at the Weizmann Institute of Science, the Tauber Fund and the Israeli Ministry of Science. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||
Nat Rev Genet. 2006 May; 7(5):337-48.
[Nat Rev Genet. 2006]Nat Rev Genet. 2007 Mar; 8(3):206-16.
[Nat Rev Genet. 2007]Genome Biol. 2006; 7(6):R52.
[Genome Biol. 2006]PLoS Comput Biol. 2007 May; 3(5):e99.
[PLoS Comput Biol. 2007]Science. 2007 Aug 10; 317(5839):815-9.
[Science. 2007]Nat Genet. 2007 Jun; 39(6):730-2.
[Nat Genet. 2007]Development. 1998 Mar; 125(5):949-58.
[Development. 1998]Development. 1999 Sep; 126(17):3725-34.
[Development. 1999]Development. 2003 Sep; 130(17):4187-99.
[Development. 2003]Development. 2003 Nov; 130(21):5133-42.
[Development. 2003]Development. 2005 Apr; 132(7):1663-74.
[Development. 2005]Science. 2003 Jun 13; 300(5626):1742-5.
[Science. 2003]Nat Genet. 2003 Feb; 33(2):138-44.
[Nat Genet. 2003]Proc Natl Acad Sci U S A. 2004 Apr 20; 101(16):6062-7.
[Proc Natl Acad Sci U S A. 2004]Science. 2005 Sep 16; 309(5742):1850-4.
[Science. 2005]Nature. 2006 Mar 9; 440(7081):242-5.
[Nature. 2006]Genome Biol. 2006; 7(6):R52.
[Genome Biol. 2006]Science. 2005 Sep 16; 309(5742):1850-4.
[Science. 2005]Proc Natl Acad Sci U S A. 2004 Apr 20; 101(16):6062-7.
[Proc Natl Acad Sci U S A. 2004]PLoS Comput Biol. 2007 May; 3(5):e99.
[PLoS Comput Biol. 2007]Nat Genet. 2006 Jul; 38(7):830-4.
[Nat Genet. 2006]PLoS Comput Biol. 2007 May; 3(5):e99.
[PLoS Comput Biol. 2007]Mol Biol Evol. 2002 Jul; 19(7):1114-21.
[Mol Biol Evol. 2002]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Nat Genet. 2007 Jun; 39(6):730-2.
[Nat Genet. 2007]Mol Biol Evol. 2007 Jun; 24(6):1283-5.
[Mol Biol Evol. 2007]Nat Genet. 2006 Jul; 38(7):830-4.
[Nat Genet. 2006]Science. 2007 Jul 6; 317(5834):118-21.
[Science. 2007]Science. 2000 Feb 4; 287(5454):873-80.
[Science. 2000]Cell. 2003 May 2; 113(3):395-404.
[Cell. 2003]Peptides. 2005 Feb; 26(2):339-50.
[Peptides. 2005]Science. 2000 Feb 4; 287(5454):873-80.
[Science. 2000]Science. 2000 Feb 4; 287(5454):873-80.
[Science. 2000]Nucleic Acids Res. 2006; 34(1):185-200.
[Nucleic Acids Res. 2006]Nat Genet. 2006 Jul; 38(7):830-4.
[Nat Genet. 2006]Bioinformatics. 2007 May 15; 23(10):1235-42.
[Bioinformatics. 2007]Science. 2000 Feb 4; 287(5454):873-80.
[Science. 2000]Nat Genet. 2006 Jul; 38(7):830-4.
[Nat Genet. 2006]Science. 2007 Jul 6; 317(5834):118-21.
[Science. 2007]Proc Natl Acad Sci U S A. 2005 Apr 12; 102(15):5483-8.
[Proc Natl Acad Sci U S A. 2005]Nucleic Acids Res. 2003 Dec 1; 31(23):7024-31.
[Nucleic Acids Res. 2003]Proc Natl Acad Sci U S A. 2004 Nov 16; 101(46):16234-9.
[Proc Natl Acad Sci U S A. 2004]Cell. 2003 May 2; 113(3):395-404.
[Cell. 2003]Science. 2007 Aug 10; 317(5839):815-9.
[Science. 2007]Nature. 2006 Aug 17; 442(7104):772-8.
[Nature. 2006]Science. 2007 Aug 10; 317(5839):815-9.
[Science. 2007]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Cell. 2007 Feb 23; 128(4):707-19.
[Cell. 2007]Nat Genet. 2006 Oct; 38(10):1210-5.
[Nat Genet. 2006]Nature. 2006 Aug 17; 442(7104):772-8.
[Nature. 2006]Nat Rev Genet. 2002 Feb; 3(2):137-44.
[Nat Rev Genet. 2002]Nature. 2005 Mar 17; 434(7031):338-45.
[Nature. 2005]BMC Bioinformatics. 2006 Mar 7; 7():113.
[BMC Bioinformatics. 2006]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Nat Genet. 2006 Jul; 38(7):830-4.
[Nat Genet. 2006]Science. 2005 Sep 16; 309(5742):1850-4.
[Science. 2005]Proc Natl Acad Sci U S A. 2004 Apr 20; 101(16):6062-7.
[Proc Natl Acad Sci U S A. 2004]Mol Biol Evol. 2006 Mar; 23(3):530-40.
[Mol Biol Evol. 2006]Mol Biol Evol. 2007 Jun; 24(6):1283-5.
[Mol Biol Evol. 2007]Trends Genet. 1990 Aug; 6(8):236.
[Trends Genet. 1990]Nat Genet. 2006 Jul; 38(7):830-4.
[Nat Genet. 2006]Nat Genet. 2006 Jul; 38(7):830-4.
[Nat Genet. 2006]BMC Bioinformatics. 2006 Mar 7; 7():113.
[BMC Bioinformatics. 2006]PLoS Comput Biol. 2007 May; 3(5):e99.
[PLoS Comput Biol. 2007]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]