![]() | ![]() |
Formats:
|
|||||||||||||||||||||||||||||||||||
Copyright : © 2005 Oliva et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Cell Cycle–Regulated Genes of Schizosaccharomyces pombe 1 Department of Molecular Genetics and Microbiology, Stony Brook University, Stony Brook, New York, United States of America, 2 Department of Computer Science, Stony Brook University, Stony Brook, New York, United States of America Paul T. Spellman, Academic Editor Lawrence Berkeley Lab, United States of America Corresponding author.Bruce Futcher: bfutcher/at/ms.cc.sunysb.edu; Janet Leatherwood: janet.leatherwood/at/sunysb.edu Received December 20, 2004; Accepted April 21, 2005. See "Transcriptional Waves in the Yeast Cell Cycle" , e243. This article has been cited by other articles in PMC.Abstract Many genes are regulated as an innate part of the eukaryotic cell cycle, and a complex transcriptional network helps enable the cyclic behavior of dividing cells. This transcriptional network has been studied in Saccharomyces cerevisiae (budding yeast) and elsewhere. To provide more perspective on these regulatory mechanisms, we have used microarrays to measure gene expression through the cell cycle of Schizosaccharomyces pombe (fission yeast). The 750 genes with the most significant oscillations were identified and analyzed. There were two broad waves of cell cycle transcription, one in early/mid G2 phase, and the other near the G2/M transition. The early/mid G2 wave included many genes involved in ribosome biogenesis, possibly explaining the cell cycle oscillation in protein synthesis in S. pombe. The G2/M wave included at least three distinctly regulated clusters of genes: one large cluster including mitosis, mitotic exit, and cell separation functions, one small cluster dedicated to DNA replication, and another small cluster dedicated to cytokinesis and division. S. pombe cell cycle genes have relatively long, complex promoters containing groups of multiple DNA sequence motifs, often of two, three, or more different kinds. Many of the genes, transcription factors, and regulatory mechanisms are conserved between S. pombe and S. cerevisiae. Finally, we found preliminary evidence for a nearly genome-wide oscillation in gene expression: 2,000 or more genes undergo slight oscillations in expression as a function of the cell cycle, although whether this is adaptive, or incidental to other events in the cell, such as chromatin condensation, we do not know. Introduction The yeasts Schizosaccharomyces pombe and Saccharomyces cerevisiae are excellent organisms for the study of the cell division cycle. Both yeasts have many well-characterized cell division cycle (cdc) mutants [1–5], and both have a long history of genetic and molecular cell cycle studies. However, they diverged more than 1 billion years ago, and have many lifestyle differences. In particular, the two yeasts have different cell cycles. S. pombe divides by fission, a symmetrical process in which a septum grows across the center of a long cylindrical cell, dividing the old cell into two equal new cells. Moreover, the main control point in the S. pombe cell cycle is a size control in G2, not in G1 as in S. cerevisiae and many other organisms. In S. pombe, when cells reach a critical size, the Cdc2 protein kinase is activated both by cyclin binding and also by Cdc25 phosphatase removal of the inhibitory phosphate from tyr15 of Cdc2, and this leads to mitosis. Once nuclear division has occurred, the cell moves quickly into S phase without an appreciable G1. Therefore S phase is largely completed by the time cytokinesis/cell separation occurs. Thus, when the cells are growing in good conditions, cells have a long G2, and most cell cycle–specific events are completed in a relatively small portion of the cell cycle encompassing M, G1, and S, with S occurring coincident with cytokinesis. When conditions are poor, a cryptic size control appears in G1 phase; that is, a G1 phase appears and becomes longer as growth rate becomes slower. In contrast, S. cerevisiae divides by “budding,” an inherently asymmetrical process whereby a large mother cell generates a small daughter bud. Once born as a separate cell, the small daughter grows in volume through a long G1, and commits to division at a G1 event called “START.” START involves the activation of a pair of closely related transcription factors, MBF and SBF, and the induction of 100 or more genes. After START, DNA synthesis is initiated, and a bud forms. There is a short G2 phase, followed by mitosis and cytokinesis, and then cells enter the next G1. When cells are growing rapidly in good conditions, G1, S, G2, and M phases are of similar lengths, and so various cell cycle–specific events are distributed somewhat equally around the cycle. However, when cells are growing slowly in poor conditions, almost all the increased length of the cell cycle is accounted for by an increased G1, and most cell cycle–specific events occur over a relatively small percentage of the cell cycle, encompassing “START,” S phase, and mitosis. Microarrays have been used to analyze gene expression in synchronized S. cerevisiae. There are at least 800 genes whose transcripts oscillate as a function of the cell cycle [6]. The cataloging of these transcripts has helped describe what happens in a cell cycle. In addition, because many of the oscillating genes are regulatory, the microarray analysis has helped us understand how the S. cerevisiae cycle is regulated. In view of the fact that S. pombe also has a well-studied cell cycle and because these two yeasts have both differences and similarities in the way they carry out a cell cycle, it is of interest to characterize oscillating transcripts inS. pombe also, to understand at a deeper level what is preserved and what changes across the cell cycles of these two model eukaryotes. Recently, Rustici et al. [7] and Peng et al. [8] have published microarray analyses of S. pombe cell cycle genes. Our results are broadly similar to theirs, but as described below, each group finds a somewhat different set of genes. There is excellent agreement between the groups with respect to the most strongly regulated genes, but naturally there is less agreement for more weakly regulated genes. Here, we concentrate on the 750 genes that are most strongly regulated, but we believe that there may be a total of 2,000 or more genes that have at least weak cell cycle regulation. A large number of weakly to moderately oscillating genes peak in G2 phase, and these are highly enriched for functions in ribosome biogenesis. Our analysis of the cell cycle–regulated promoters shows them to be surprisingly complex, and shows clusters of multiple regulatory motifs similar to clusters of motifs found in the developmental genes of Drosophila. Although Rustici et al. [7] have pointed out several differences between the cell cycles of S. pombe and S. cerevisiae, we find that there are also striking similarities, suggesting deeply conserved mechanisms. Results Synchronous Cultures and Identification of Cell Cycle–Regulated Transcripts Three synchronous cultures were studied, one generated by cdc25 block release, and two generated by elutriation. Each culture was sampled through three cell cycles, giving nine cell cycles of data. Synchrony and cell cycle position were assayed by scoring initiation of anaphase and septation microscopically (Figure 1
The list of all 5,000 genes ranked by p-value and other associated information such as time of peak expression is given in Table S1. The raw data have been deposited at ArrayExpress (http://www.ebi.ac.uk/arrayexpress/). The raw data, all figures and all tables, are available at: http://publications.redgreengene.com/oliva_plos_2005/. The distribution of genes versus p-values is shown in Figure 2
Because the distribution of genes versus p-values continuously increases after gene 203, one must choose a somewhat arbitrary threshold for discussion of cell cycle–regulated genes. We have chosen to discuss the best 750 genes in our p-value list. This number is similar to the number of genes chosen by Peng et al. [8] and Rustici et al. [7] as being cell cycle regulated (747 and 407, respectively), and similar to the number of genes chosen for the yeast S. cerevisiae (800) [6], thus facilitating comparison of these gene sets. In the vicinity of the 750th gene (and even below), most genes display an oscillatory behavior to the eye, at least in one or two of the three experiments. Finally, the number 750 is obviously somewhat arbitrary, and indeed we have no basis for anything other than an arbitrary cutoff. Because the list of genes is ranked, other investigators may choose their own sets of oscillatory genes from ourp-value list (Table S1) by choosing any desired cutoff. For the top 750 genes, the false discovery rate is 0.00022, so on a statistical basis, less than one false positive is expected in the list of 750. Although we will discuss primarily these 750 best genes, there are many more genes that appear to oscillate slightly. A total of 2,262 genes (nearly half the genes in the genome!) have a p-value less than 0.05, the usual statistical cutoff. Based on the false discovery rate, we would expect about 53 of these to be false positives, but even so, this leaves well over 2,000 genes with a slight but statistical oscillation. Previously, 37 cell cycle–regulated genes have been reported inS. pombe; 29 of these (78%) are in our top 750. Of the eight that are not in our top 750, two are in the top 1,000. The remaining six (cdc19/mcm2, cmk1, dmf1/mid1, ppb1, uvi22/rrg1, and suc22) are also not in list of 407 of Rustici et al. [7], and three of these (cdc19/mcm2, cmk1, andppb1) are also not in the list of Peng et al. [8]. Thus, these genes are probably quite weakly regulated (except for suc22, for which there are two transcripts, one regulated and one not [9]). The top 750 genes are shown in Figure 3
Rustici et al. [7] have recently compiled a list of 407 periodically-expressed S. pombe genes, and while our manuscript was in review, Peng et al. [8] identified 747 similar genes. A comparison of the three studies is shown in Figure 4
Despite the fact that 1,013 genes were found in only one of the three studies, we believe that most of these 1,013 do indeed oscillate to some extent. There are two lines of evidence. First, most of the genes do display a clear oscillatory pattern to the eye, at least in one of the studies. For instance, Figure 3 The second line of evidence is that most of the 1,013 genes unique to one study also display some statistical oscillatory behavior in one or both of the other studies, even though this behavior is not strong enough to surpass the threshold for inclusion on the cell cycle list in those studies. This effect is shown in Figure 5
Before the publication of Peng et al. [8], we had compared our study to that of Rustici et al. [7] to look for discrepancies. We identified a total of 21 genes (11 from Rustici et al., ten from us) that appeared very strongly regulated in one study, but not at all regulated in the other. We have now checked these 21 genes against the results of Peng et al., and find that 17 of the 21 appear regulated in Peng et al., whereas four (three from us and one from Rustici et al.) do not appear regulated. Thus it seems that both we and Rustici et al. have been conservative in our identification of cell cycle–regulated genes and tend to get false negatives rather than false positives. In summary, the three cell cycle lists together implicate about 1,300 genes, and our ranked p-value list does not become worse than a p-value of 0.05 until gene number 2,262. We believe that a very large number of S. pombe genes, 2,000 or more, have at least a weak cell cycle oscillation. Two Genome-Wide Waves of Transcription To examine the distribution of gene expression around the cycle, Fourier analysis was used to determine the time at which each gene's expression peaked (the “phase angle” of peak expression). For genes in the bottom half of the 5,000 gene rank list (i.e., genes that did not cycle appreciably), phase angles were largely determined by noise, but nevertheless would tend toward the peak of any weak cyclic behavior that may have existed. The number of genes peaking at each time in the cycle was plotted (Figure 6
There were two striking findings. First, it appears that there are two broad waves of gene expression, one peaking in early to mid G2, and the second peaking in late G2/M, whereas there are troughs in mid to late G2, and in S. The early/mid G2 peak contains the Ribosome biogenesis cluster (see below) and associated genes, whereas the late G2/M peak contains the genes of the Cdc15, Cdc18, and Eng1 clusters (see below), which are important for M and S. Second, the two waves of gene expression were seen even in the 4,000 least-cyclic genes. As noted above, there is statistical evidence from p-values that 2,000 or more genes may oscillate slightly. The two waves of expression seen for the bottom 4,000 genes confirm that many of these genes do indeed oscillate. If the fluctuations in these 4,000 genes had simply been due to noise, then the peak phase angles would have been uniformly distributed from 0° to 360° (as confirmed by repeating the analysis on shuffled data; Figure 6 Cluster Analysis To study the regulation of the cell cycle, we wished to find clusters of co-regulated genes potentially responding to the same transcription factor. However for this purpose it is not sufficient to find genes expressed at the same time, because such genes might be responding to different mechanisms of regulation. This is an acute problem in S. pombe, because mitosis, DNA synthesis, and cytokinesis all occur in a small window of the cell cycle under standard growth conditions. Therefore our analysis included not only our three time courses of synchronous cells, but also eleven other array experiments that more directly addressed regulatory mechanisms. These experiments (see Materials and Methods) included small cells grown in poor nitrogen to induce a G1 phase; a cdc10-M17 block-release experiment, to separate S phase events from cytokinesis and septation events; an arrest at G1 (using cdc10-M17, encoding MBF transcription factor subunit); an arrest at S (using cdc22-M45, encoding ribonucleotide reductase); an arrest at late G2 (using cdc25–22, encoding the phosphatase that activates Cdc2); an arrest at M (using nuc2–663, encoding a subunit of the anaphase promoting complex); and finally, from the data of Rustici et al. [7], experiments using a constitutively active allele of cdc10(cdc10-c4), null and over-expressor alleles of the forkhead transcription factor sep1, and null and overexpresser alleles of the transcription factor ace2. Hierarchical clustering was used [10] because the underlying structure of a gene regulatory network is somewhat hierarchical. Thus, a hierarchy found by the clustering algorithm is often interpretable in terms of a hierarchical transcriptional network existing in the cell (see S. J. Gould's essay [11], “Linnaeus's Luck?”, for an illuminating discussion of this issue in a different context http://www.findarticles.com/p/articles/mi_m1134/is_7_109/ai_65132190). The clustergram of 750 genes is shown in Figure 7
If the genes in each cluster are truly co-regulated, then the promoters of these genes will be bound by the same transcription factor, and therefore the promoters should share a common DNA sequence motif corresponding to the transcription factor binding site. We searched for such motifs upstream of the genes in each cluster. We used three motif search programs: AlignAce, a Gibbs-sampling algorithm [12]; SPEXS, a word-count algorithm (http://www.egeen.ee/u/vilo/SPEXS/) [13,14], and MEME, an expectation-maximization algorithm (http://meme.sdsc.edu/meme/website/intro.html) [15,16]. In general, all three programs found the same motifs. In the study of Rustici et al. [7], four clusters were found. It is difficult to compare the clusters of Rustici et al. with ours: The genes, experiments and clustering methods were different. However, in general, the clustering of Rustici et al. tended to produce fewer, larger clusters, and focused on time of expression as the main distinction between the clusters, whereas our method produced more, smaller clusters, and focused on regulatory mechanisms (as well as time of expression). Peng et al. [8], like us, used hierarchical clustering and found eight clusters, some of which are quite comparable to ours. However, again, we put more emphasis on regulatory mechanisms as opposed to time of expression, and this generated some different clusters. The M Clusters The wave of expression in the late G2 and M phases includes most of the strongly regulated genes. This wave contains three major clusters, which we call the Cdc15, Cdc18, and Eng1 clusters (Figure 8
The Cdc15 cluster (Figure 8 Cytokinesis/septation fuctions can be ascribed to at least 13 genes including the key SH3 domain gene cdc15 and its paralog imp2, and a third SH3 domain gene, pob1. Also present are the kinases fin1 and sid2 and phosphatase subunit par2, which regulate the septation initiation network. mob1, which interacts with sid2, is also cell cycle regulated with similar timing, but lies outside the cluster as defined here. Other members likely involved in cytokinesis include genes for the rho family member rho4, the putative rhoGEF rgf3, the septin spn2, and the myosin myo3. Construction of the septum involves synthesis of plasma membrane and deposition of proteins into that membrane. The Cdc15 cluster is rich in proteins involved in these processes. The cluster includes gwt1, likely involved in GPI anchor synthesis, and SPAP27G11.01, SPCC306.05c, and SPBC2F12.05c, linked with sterol functions. SPAC227.06 (a predicted Rab interactor), psy1 and bet1 (SNAREs), and SPBC31F10.16 are likely to function in vesicle transport. The budding yeast homolog of SPBC31F10.16, CHS6, is important for movement of chitin synthase from the trans-Golgi network/endosome to the plasma membrane. Other genes encode cell surface glycoproteins, such as the gene mac1, which is localized at poles and septum and is important for cell separation. Genes for cell wall metabolism include two chitin synthase homologs, a putative chitin synthase regulator, six putative sugar/starch hydrolases, and the MAP kinase pmk1. Finally, diverse other functions are represented. There are at least five genes involved in transcription, most notably the transcription factor fkh2, which may be one of the regulators of the Cdc15 cluster [17] (see below). There are also multiple genes involved in mitochondrial functions and in glycosylation. The three motif search programs all found the consensus motif
GTAAACAAA, easily recognizable as a binding site for a forkhead (FKH) transcription factor. Almost every gene in the cluster had such a motif. In S. cerevisiae, the main clusters of mitotic genes are also regulated (in part) by forkhead transcription factors. S. pombe has several forkhead transcription factors, but the two most likely to regulate the Cdc15 cluster are sep1 and/or fkh2. sep1 does not oscillate noticeably in our dataset, but it does have phenotypes that could be due to defects in the expression of genes of the Cdc15 cluster, and Rustici et al. [7] have shown defects in cell cycle expression in sep1 mutants. fkh2 does oscillate, and is a member of the Cdc15 cluster. The fkh2 promoter contains two sites each for Forkhead, Ace2, and Cdc10. Interestingly, peak expression of fkh2 precedes the peak of 94% of the other genes in the cluster, consistent with the idea that it might help regulate these other genes. No direct binding of either Sep1 or Fkh2 to any of these promoters has been demonstrated, and we believe it is still an open question which protein regulates this cluster. It is possible that both proteins contribute. Because forkhead transcription factors can both repress and activate, and because they are regulated both transcriptionally and post-transcriptionally, the regulatory mechanisms could be complex.
The motif search programs also found
CCAGCC (Ace2 binding sites) and
ACGCG (MBF/Cdc10 binding sites) in a substantial minority of the genes of the Cdc15 cluster. Many genes (e.g., fkh2 and pds5) had all three kinds of sites. MEME (but not the other programs) also found the motif (A/T)
TGACAAC. This is probably the same as the motif
CATG(A/T)
CAAC found by Rustici et al. [7] and named “New 1.” To minimize confusion, we will refer to our version of the motif as “New1v” (“v” for variant).
MEME also found the motif CC(T/A)CG(T/C)TCC, and this may be a variant of the motif (A/T)ACC(T/A)CGC(T/A) (“New 3”) found by Rustici et al. We will refer to our motif as “New 3v.” New 3v was found preferentially in front of genes for cell wall metabolism, such as hydrolases, glycoproteins, chitin synthases, and their regulators. Other functionally related genes are found in the Eng1 cluster (see below), where they appear to be regulated by Ace2. Interestingly, the consensus site for Ace2 (
CCAGCC) is reminiscent of the core of New 3v (
CCACGC), suggesting that an unknown Ace2-like factor could be involved.
We did not find the “PCB” consensus (
GCAAC(G/A)), previously implicated in the control of some of the genes of this cluster [18,19].
The Cdc18 cluster (Figure 8 The Cdc18 cluster has a very similar cluster in S. cerevisiae, called the CLN2 cluster. Both clusters contain genes involved in DNA replication, and both clusters appear to be regulated by the MBF transcription factor (see below). For the Cdc18 (pombe) and CLN2 (cerevisiae) clusters, many of the genes in the clusters are orthologs; e.g., mik1/SWE1, cig2/CLB5, mrc1/MRC1, cdc22/RNR1, andsmc3/SMC3. Thus the cell cycle clusters regulating DNA synthesis are very highly conserved, with the overall function of the clusters, the regulation of the clusters, and the genes in the clusters, all being quite similar from S. cerevisiae to S. pombe. The three motif search programs found two motifs in the Cdc18 cluster:
ACGCG, and
ACGCG(A/T)
CGCG. The first of these is easily recognizable as the binding site for the MBF transcription factor (also known as DSC1) [20–22], whereas the second is a related motif that may be a tandem, double binding site for MBF, or for an MBF-like factor. Consistent with the idea that MBF is a major regulator of this cluster, the genes of the cluster are up-regulated by the cdc10-c4 mutation (see Figure 8 S. cerevisiae has two MBF-like transcription factors. One is itself called MBF and consists of the DNA-binding protein Mbp1 complexed with the modulatory protein Swi6. The second factor is called SBF and consists of a second DNA-binding protein, Swi4, complexed with Swi6. S. cerevisiae MBF and SBF, with their related but distinct DNA-binding proteins, bind to related but distinct motifs, and control the cell cycle expression of partially overlapping sets of genes [23,24]. In S. pombe, there is likewise one modulatory protein, Cdc10 (the ortholog of Swi6) and two DNA-binding proteins, Res1 and Res2 (possible orthologs of Mbp1 and Swi4) [20–22,25–27]. Some investigators believe that in S. pombe, there is a unique MBF transcription factor and that it contains Cdc10, Res1, and Res2 [25,26,28]. However, other investigators believe that the situation is similar to that found in S. cerevisiae and that there may be two MBF-like factors, one containing Cdc10 and Res1, and the other containing Cdc10 and Res2 [27,29]. Although our results do not speak directly to these models, the fact that we find two kinds of motifs is easier to interpret in terms of a model with two different but related forms of MBF. The Eng1 cluster (Figure 8 The Eng1 cluster has a recognizably similar functional cluster in S. cerevisiae, the SIC1 cluster [6]. This cluster also has many genes involved in cell separation (e.g., EGT2, an endoglucanase; CTS1, an endochitinase; YGL028c, a glucanase; DSE2, a glucanase; and CHS1, a chitin synthase), and the genes of the S. cerevisiae cluster are also regulated from Ace2 binding sites of the same consensus sequence (
CCAGC). However, there is only one gene that is clearly present in the cluster in both species, the glycosyl hydrolase eng1 in S. pombe, and its ortholog DSE4 in S. cerevisiae. Thus the overall function of the cluster (cell separation), the nature of many of the enzymes in the cluster (carbohydrate hydrolytic), and the mechanism of gene regulation (binding by Ace2) have been conserved, even though the individual genes in the cluster have been largely shuffled. It is easy to understand why the individual genes are different, because the two species have cell walls containing different carbohydrates (and so requiring different hydrolytic enzymes), and because the modes of cell separation are very different (fission vs. budding). In fact, given these differences, it is remarkable that the mode of regulation and the functional cluster seem to have been conserved.
The S/Early G2 Clusters The relatively few genes that peak in late M, S, or early G2 fall into three small clusters: the telomeric cluster, the histone cluster, and the Wos2 cluster (Figure 9
The telomeric cluster (Figure 9 The histone cluster (Figure 9 Motif searches showed that all the histone genes (but not the two telomeric genes) had the motif
GGGTTAGGGTT(T/G). A degenerate second copy was sometimes also present. This motif has been noted previously [32]. In addition, six of the histone genes (and both telomeric genes) had a motif similar to an MBF binding site, G(C/G)(T/G)
ACGCG.
In S. cerevisiae, the histone genes have at least three semi-redundant regulatory systems: First, they have the HIR gene system that represses histone expression outside of S [33,34]. Second, they have regulated mRNA stability, such that the messages are only stable during S [35]. Third, they have a system for gene induction during S. Recently, it has been suggested that this positive system relies on the SBF transcription factor, possibly in combination with a forkhead transcription factor [36]. The fact that an MBF motif is found in front of most of the S. pombe histone genes is consistent with the SBF motif found in front of most of the S. cerevisiae histones, and suggests that MBF may play a role, along with other mechanisms, in regulating histone expression in S. pombe. The Wos2 cluster (Figure 9 The early to mid G2 genes: The Ribosome biogenesis and Cdc2 clusters Although most of the strongly regulated genes peak near the G2/M transition, another large group of genes, 200 or more, peaks with a moderate amplitude at almost exactly the opposite side of the cell cycle, in early to mid G2 (Fig. 10
Other genes in the ribosome biogenesis cluster are involved in nuclear/cytoplasmic import and export. These genes include: nup61 (nucleoporin with a RanBp-binding domain), kap123 (karyopherin), SPCC550.11 (RanBP7/importin-beta/Cse1p family, RanGTP-binding protein involved in mRNA export), and mep33 (mRNA export protein). It is not clear why such genes would be cell cycle regulated. However, Mitchison and colleagues [37–41] have documented a cell cycle oscillation in the rate of growth and protein synthesis in S. pombe. In these studies, there seems to be an acceleration of protein synthesis, and a corresponding acceleration in cell growth rate, in mid G2. Furthermore, “NETO” (new end take off, the time when the new end begins to grow) occurs at about this time. The peak in expression of ribosome biogenesis genes we observe in early/mid G2 could lead to this slightly later peak of protein synthesis and growth rate. Sveiczer et al. [41] suggest that the acceleration in protein synthesis is the “sizer” that leads to commitment to division; in terms of our findings, the peak in transcription of the ribosome biosynthesis genes would be an important component of the sizer. We have recently found that many S. cerevisiae ribosome biogenesis genes are also cell cycle regulated (Figure 11
Surprisingly, we found no DNA sequence motifs associated with the promoters of the genes in the ribosomal biogenesis cluster. As one moves out from the center of the ribosome biogenesis cluster, one encounters many other genes peaking in G2 phase. These are of diverse function, but one interesting example is the pma1 gene, which encodes a proton pump. This pump is needed to maintain the proton gradient across the plasma membrane, affecting many processes, and so seems an unlikely candidate for a cell cycle–regulated gene. Nevertheless, it is cell cycle regulated both here and in S. cerevisiae [6]. The reason for the oscillation is unclear, but because Pma1 is an integral plasma membrane protein that must be inserted into the membrane at the time of synthesis, one possibility is that its synthesis matches the rate of plasma membrane production; in S. cerevisiae, this may reach a peak in G2, accounting for the peak in PMA1 transcription. A similar explanation could hold true in S. pombe. Adjacent to the Ribosome biogenesis cluster is a cluster of 23 genes we call the Cdc2 cluster (see Figure 10 Characterization of Cell Cycle–Regulated Promoters Each cluster was searched for DNA sequence motifs. The most significant motifs are summarized in Table 1. However, the presence of these motifs in the upstream regions of the genes of a cluster says little about promoter structure. To investigate promoter structure in more detail, we used a program called SpikeChart (S. Pyne, B. Futcher, and S. Skiena, unpublished data) that finds and displays motifs in DNA sequences. SpikeChart uses a weight matrix to define a consensus motif, and it shows each occurrence of a motif as a spike of varying height depending on that motif's match to the consensus. For instance, a motif that matches the consensus motif exactly would be given a spike height of ten, whereas a motif with one or more mismatches to the consensus would be given a lower score, depending on the number and nature of the mismatches. (Weight matrices and scoring functions are shown in Table S2). SpikeChart can score many different kinds of motifs simultaneously, and can show the position of all scored motifs, so it is well suited to finding groups of motifs, whether they be of the same kind or different kinds. Initially, because we did not know where regulatory motifs might occur, SpikeChart was used to examine the first 200 base pairs (bp) of the open reading frame in question, and 2,000 bp upstream of the start codon (regardless of whether this region included the next open reading frame or not).
Groups of closely spaced, multiple motifs were usually visible, and these groups usually occurred in the upstream intergenic region (as opposed to within the open reading frame) (Figure 12
We did not notice any cases where the group of regulatory motifs was inside an open reading frame (either the downstream or upstream open reading frame). In long (>1 kb) intergenic regions, the group of motifs usually occurred within 800 bp of the start codon, but this was not always true; a substantial minority of regulatory motif clusters occurred more than 800 bp upstream (but still within the intergenic region). Because the median S. pombe intergenic region is only 900 bp, we wondered whether the cell cycle genes might have unusually long promoters. We measured the length of upstream intergenic regions versus cell cycle rank in our list of all 5,000 genes. The most strongly regulated 200 genes had upstream intergenic regions of about 1,200-bp median length, versus a genome-wide median length of 900 bp. Thus, the more strongly cell cycle–regulated genes have longer than average upstream regions. We have noticed the same phenomenon with the cell cycle regulated genes of S. cerevisiae (S. Pyne, S. Skiena, and B. Futcher, unpublished data). The longer-than-average promoters found for cell cycle–regulated genes suggests that these promoters might be above average in complexity. Discussion How Many Cell Cycle–Regulated Genes Are There? We have ranked S. pombe genes by the statistical significance of their oscillation, and we have discussed the most cyclic 750 genes. However, p-values (see Table S1) and other evidence (see Figure 6 However, at the same time, it seems unlikely that 2,000 genes would be directly involved in the cell cycle. There might be at least two kinds of reasons for the observed oscillations. First, an oscillation might be adaptive; i.e., there might be natural selection in favor of the oscillation. The DNA synthesis genes (e.g., cdc18, pol1, and cdc22) in the Cdc18 cluster are examples of genes in which it is easy to believe that the oscillation is adaptive. But second, some oscillations may be incidental. That is, there might be no selective advantage whatsoever to the oscillation, but instead the oscillation is a secondary or indirect effect. For example, chromatin condenses during mitosis. At least in multicellular eukaryotes, mitosis is associated with genome-wide repression of transcription. If there is a similar loss of transcription during mitosis in S. pombe, and if our microarray experiments are sufficiently sensitive, we will detect this decreased transcription as a cell cycle oscillation with a trough in mitosis for essentially all genes (preferentially the genes with a short mRNA half-life). But this cell cycle oscillation, though real, does not imply that the oscillation of any of these genes is beneficial; instead, it is a secondary consequence of mitotic repression and chromatin condensation, which presumably is beneficial. Incidental oscillation might also arise when two genes are adjacent to each other. One of the genes might oscillate for adaptive reasons, but the oscillation of this gene might carry over to adjacent genes, for which natural selection is perhaps indifferent to oscillation. How can we distinguish adaptive from incidental oscillation? First, adaptive oscillations are likely to be large-amplitude oscillations, whereas incidental oscillations are likely to be small-amplitude oscillations. Our cutoff at 750 genes is a crude first screen to enrich for genes with adaptive oscillation. Second, one should consider the total oscillation of the gene's final activity. That is, the oscillation of a gene's transcript might be small. But if one finds that the same gene also has an oscillation in protein stability (e.g., because of regulated proteolysis), and also an oscillation in enzyme activity (e.g., because of phosphorylation), this suggests that the oscillation is adaptively significant. For example, in S. pombe, the cyclin transcripts oscillate only modestly, and yet the oscillation of the final product (Cdc2 protein kinase activity) is large. The modest oscillation of the transcript contributes in a significant, multiplicative way to the overall oscillation, and is undoubtedly adaptive. Third, one should consider co-regulated genes and the mode of regulation. If a gene is a member of a small cluster of genes, and the genes have related functions and are regulated by a specific cell cycle transcription factor, then the oscillation is almost certainly adaptive. But if the gene is co-regulated with hundreds of other genes all with very small oscillations, and there is no common function to the genes and no known cell cycle transcription factor, then the oscillation of the whole set of genes may be secondary to some effect such as chromatin condensation. Fourth, one should consider the chromosomal location. Genes adjacent to adaptively regulated genes could oscillate passively. In particular, genes in regions of special chromatin structures (e.g., near telomeres, centromeres, and silenced regions) could oscillate as a secondary consequence of cell cycle changes in the special chromatin structure. In summary, we feel that a very large number of S. pombe genes, 2,000 or more, have at least very small cell cycle oscillations. But it is possible that in many cases this oscillation may be incidental and that only a smaller but unknown number oscillate for adaptive reasons. Sorting adaptive from incidental oscillations will require additional experiments. Two Genome-Wide Waves of Transcription There were two large waves of transcription, one peaking in early/mid G2, and the other peaking in late G2 or M (see Figure 6 One property of these early/mid G2 genes is that they are deeply repressed at the nuc2 block in mitosis (see Figure 10 A related observation is that in the 1970s and 1980s, metabolic labeling studies were done on synchronized cultures of S. pombe. These studies found “steps” of incorporation of labeled uridine into RNA (mostly ribosomal RNA) as a function of cell cycle phase. Around mitosis, incorporation was poor, then after mitosis, the rate of incorporation increased, and then flattened out again at the next mitosis, then increased, and so on. The interpretations of this step-like, cell cycle–regulated uridine incorporation were varied, and the subject disappeared from the literature without resolution [53–56]. Putting these observations together, we speculate that S. pombe, too, may have some degree of mitotic repression, perhaps important for chromosome condensation. Pol I accounts for the vast majority of the transcription in the cell. Mitotic repression of Pol I transcription of the ribosomal RNA genes would account for the pause in uridine incorporation seen in mitosis in the metabolic labeling studies. But if ribosomal RNA is not transcribed in M, and given that the components of the ribosome are tightly coordinated in their production, then genes for ribosomal proteins (as seen by Peng et al. [8]), and genes for ribosome biogenesis, might also be repressed in M. Repression in M would account for the oscillation of the ribosome biogenesis cluster and its repression at a nuc2 arrest. Finally, if the ribosome biogenesis genes cluster together because they are subject to mitotic repression, this might explain why the cluster does not contain any characteristic 5′ motifs: Mitotic repression might not work through a particular upstream site-specific transcription factor. Indeed, in S. cerevisiae, ribosome biogenesis transcripts are controlled in part at the level of mRNA stability [57]. Thus, we suggest that S. pombe may have a form of mitotic repression and that this repression in mitosis may account for the oscillation of the ribosome biogenesis genes and other genes peaking in early/mid G2 phase and troughing in M. The second large wave of gene expression peaks in late G2 and in M. This wave includes the Cdc15 cluster (which has many genes for mitosis), the Cdc18 cluster (DNA replication), and the Eng1 cluster (cell separation). There are many important cell cycle events in M and S, and these two phases are close together in rapidly-growing S. pombe. The many genes peaking in late G2 and M may simply represent the cell's efforts to prepare for the many activities of M and S. It will be of interest to see what happens to the timing of the Cdc18 cluster (DNA synthesis genes) in slowly growing cells with a long G1: Will they still be transcribed in mitosis, or will they now be transcribed in late G1? If mitotic repression does exist in S. pombe, how is it that the Cdc15, Cdc18, and Eng1 clusters peak in M phase? Baum et al. [58] have used nuclear run-on to show that cdc18 and some other members of the cdc18 cluster can be actively transcribed in mitosis at a time when histone H1 kinase activity is high and chromatin is presumably condensed. Our own results agree that essentially all the genes of the Cdc15, Cdc18, and Eng1 clusters are highly expressed at a nuc2 arrest, a time at which histone H1 kinase activity is high, and chromatin should be condensed. Our elutriation data suggest that in normal cells, the peak of expression of genes in the Cdc15 and the Cdc18 cluster is almost simultaneous with mitosis (see Figures 6 The more moderately expressed genes in the G2/M wave (i.e., genes not in the Cdc15, Cdc18, or Eng1 clusters) tend to be expressed in late G2 rather than in M (see Figure 6 Comparison of Cell Cycle Genes in S. pombe and S. cerevisiae Of our top 200 ranked cell cycle–regulated genes, 72 (36%) had S. cerevisiae homologs that cycled, 68 had S. cerevisiae homologs that did not cycle significantly, and 60 did not have clear S. cerevisiae homologs. (A detailed comparison of the top 200 S. pombe genes and their S. cerevisiae homologs is available as Table S3). Genes involved in core cell cycle processes such as DNA synthesis and mitosis were especially likely to cycle in both organisms. On the other hand, genes involved in budding (in S. cerevisiae) or fission (in S. pombe), or in cell wall carbohydrate metabolism, generally did not cycle in both organisms for the obvious reasons that the mechanism of cell separation, and the nature of the carbohydrates in the cell wall, are not conserved between the two yeasts. There are many individual cases where a process is cell cycle–regulated in both organisms, but either the level of regulation (i.e., transcriptional or post-transcriptional) or the identity of the gene regulated varies between the two yeasts. One example is the activity of the cdc2/Cdc28 protein kinase. In S. cerevisiae, most of the cyclins are very strongly regulated at the transcriptional level (e.g., CLN1,CLN2, CLB5, CLB6, CLB1, and CLB2), but in S. pombe, the equivalent cyclins are only weakly or moderately regulated at the transcriptional level. Possibly compensating for this relatively weak transcriptional regulation, S. pombe has very strong post-translational regulation of Cdc2 kinase activity via Wee1/Mik1 inhibitory tyrosine phosphorylation of Cdc2, whereas the homologous system is relatively weak in S. cerevisiae. That is, both yeasts strongly regulate cdc2/Cdc28 activity through the cycle, but emphasize different mechanisms. A second example is provided by the gene products of dut1 (SPAC644.05c) and ung1. These proteins both work to exclude uracil from DNA, but by independent mechanisms. The Dut1 protein hydrolyses dUTP, whereas the Ung1 protein removes uracil from DNA by cleaving the glycosidic bond. In S. cerevisiae,dut1 is very weakly cell cycle regulated, whereas ung1 is moderately regulated. In S. pombe,dut1 (SPAC644.05c) is very strongly regulated, whereas ung1 appears not to be regulated at all. Thus both yeasts use cell cycle transcriptional control to exclude uracil from DNA, but the emphasis is on different genes. Regulatory Networks and the Late G2 Bump In S. cerevisiae, there is a regulatory network governing the transcription of cell cycle genes. This network is organized as a circular cascade, such that transcriptional and post-transcriptional changes occurring during one part of the cycle seem to promote changes in the next part of the cycle, and so on around a circle [67–69]. In principle, S. pombe must also have a circular cascade of some kind to make the cell cycle repeat. However, fewer cell cycle regulatory mechanisms have been described in S. pombe than in S. cerevisiae, and so the wiring of the putative cascade is still unclear. In particular, it is unclear how extensive a role is played by transcriptional control. Moreover, in S. cerevisiae, genes displaying large-amplitude cell cycle changes are distributed throughout the cycle [6], consistent with the idea that transcriptional control contributes significantly to all phases of the cascade [67]. However, in S. pombe, most large-amplitude genes are expressed in a window near the G2/M transition, whereas genes of moderate and low amplitudes are distributed throughout the cycle. This concentration of large-amplitude genes near M may suggest that transcriptional control is most important for only some portions of the cascade. Within the G2/M window of high-amplitude transcriptional regulation, one can discern what may be part of the regulatory wiring diagram. The transcription factor gene fkh2 peaks in the earliest part of the late G2 window. Over 100 other genes in this window, including fkh2 itself, have FKH binding sites, so the up-regulation of fkh2 may contribute to this large wave of gene expression. One of the critical targets of the Fkh transcription factor may be the gene for the Ace2 transcription factor. The ace2 promoter has multiple sites for Fkh binding. The ace2 promoter also has one site for Ace2, so, like the fkh2 gene, ace2 may be autoregulatory. The Ace2 transcription factor then induces a cluster of genes involved in cell separation and cell wall metabolism. Interestingly, a forkhead transcription factor is involved in turning on the ACE2 gene in S. cerevisiae, so this particular part of the cell cycle wiring diagram appears to be conserved in the two species. Three of the major cell cycle transcription factors in S. cerevisiae, MBF/SBF, Fkh, and Ace2/Swi5, have homologous cell cycle transcription factors in S. pombe. The major exception is Mcm1, a MADS-box transcription factor. In S. cerevisiae, there are two paralogs of this gene, MCM1, and ARG80. Mcm1 is a transcription factor for cell cycle genes and mating genes, whereas Arg80 controls various metabolic processes. The best S. pombe orthologs are Map1 and Mbx1 [19]. There was no noticeable enrichment of an Mcm1-like binding motif in front of any cluster of cell cycle–regulated genes; i.e., there was no evidence for a binding site for Map1 or Mbx1. In multicellular animals, the major well-characterized cell cycle transcription factor(s) are those of the E2F/DP family [70,71]. These typically control a cluster of genes expressed in late G1, and the genes are involved in DNA replication and commitment to the cell cycle. Functionally, the genes controlled by E2F/DP in animals are similar to the genes controlled by MBF in the two yeasts. E2F and DP proteins are not very similar in sequence to the proteins found in MBF, but it is also true than various E2F and DP proteins are not very similar to each other, though they are clearly related. E2F and DP recognize binding sites with a
CGCG core, as does MBF. Furthermore, the DNA-binding domain of E2F/DP factors consists of a winged-helix fold [72], as do the DNA-binding domains of Swi4 and Mbp1 (components of S. cerevisiae SBF and MBF, respectively) [63,72]. Thus, despite the overall sequence dissimilarity, it is possible that MBF in the yeasts, and E2F/DP in animals, are cell cycle transcription factors that are related by descent and which have always controlled the cell cycle expression of genes involved in DNA replication.
Materials and Methods Microarrays Microarrays were made by spotting unmodified, double-stranded PCR products onto glass slides coated with aminopropylsilane (Erie Scientific). Spotting was done using a robot of the DeRisi design (http://cmgm.stanford.edu/pbrown/mguide/) and ArrayMaker2 software (http://derisilab.ucsf.edu/arraymaker.shtml). PCR primers were designed using Primer3 (Whitehead Institute, Cambridge, Massachusetts, United States) and a shell script. Primers were designed against approximately 5,000 open reading frames and RNAs (excluding pseudogenes) as annotated by the Sanger Centre (http://www.sanger.ac.uk/Projects/S_pombe/DNA_download.shtml). In general, PCR primer pairs were designed to give products 500 to 1,000 bp in length, because the yield of the PCR reaction decreased for products longer than 1,000 bp. When the PCR product was small compared to the length of the gene, it was usually chosen from the 3′ region of the gene, so as to maximize representation in poly dT-primed cDNA synthesis. PCR products were amplified from genomic S. pombe DNA, and so in some cases the final product included introns, but the design parameters maximize contiguous exonic sequence. A fuller description of the microarrays will be published elsewhere. A full description of the primer pairs, and hence the features on the microarrays, can be found at http://www.redgreengene.com. Cell cycle synchronizations Two methods of cell cycle synchronization were used, elutriation and a cdc25–22 block and release. Two independent elutriation experiments were carried out. For elutriations, 8 l of h-972 cells (wild-type) were grown in YES (autoclaved, elutriation B or filter-sterilized, elutriation A) to early log phase (OD600 = 0.4) at room temperature (25 °C). Cells were harvested by centrifugation, resuspended in approximately 100-ml YES, and sonicated, all at room temperature. For elutriation B, approximately half of the cell volume was reserved for the reference cDNA preparation. For elutriation A, the reference cDNA was prepared independently and the entire sample was used for elutriation. Cells were loaded into a Beckman elutriator rotor containing two 40-ml elutriation chambers connected in series. When two chambers are used in series, the bulk of the cells remain in the first chamber, but the smallest cells flow into the second chamber, and then, at higher pump speeds, some of these flow out of the elutriator for collection. This arrangement provides both high capacity and high resolution. The elutriator was used at 1,800 rpm at room temperature. After every increase in pump speed, a fraction of about 150 ml was collected, containing about 5 × 108 cells (elutriation B) or 3 × 109 cells (elutriation A). These were diluted to OD600 0.2–0.05 (greater dilution for samples harvested at late times) with conditioned (elutriation B) or fresh filter-sterilized (elutriation A) medium, and then sampled with time. An entire three cell cycle time course was obtained from five elutriator fractions (elutriation B) or two fractions (elutriation A). We used adjacent fractions containing no (< 0.5%) septated cells; the elutriator fraction with the largest cells (i.e., the last fraction collected) was used first, then the elutriator fraction with the next largest cells, and finally the elutriator fraction with the smallest cells (i.e., the first fraction collected). In general, the fractions were “overlapped,” i.e., the last sample from one fraction and the first sample from the next fraction were collected at the same time. “Overlapped” fractions, though collected at the same time, were deemed to have been collected at slightly different times; the number of minutes by which overlapped fractions were offset was determined by the offset, in minutes, of the septation indicies for the two fractions. That is, for any pair of overlapped fractions, the smaller cells were deemed to have been collected earlier, by a time determined from the offset of the septation indicies of the two fractions. Note that elutration A used only two fractions, and so there was only one overlap. Samples were taken about 10 min. (elutriation A) or 15 min (elutriation B) apart; exact sampling times are given in the Treeview files 1, 2, and 3 (Dataset S1) and at http://www.redgreengene.com. Cells (108 cells/sample) were harvested by centrifugation at 4 °C and washed with ice-cold water, snap frozen, and stored at −70 °C. For elutriation A, an equal volume of ice was added to the cell culture during harvest (harvest with ice). The reference sample for hybridizations was sonicated cells prior to elutriation (elutriation B), or h-972 grown to OD600 0.2 in filtered YES at 25 °C (elutriation A). Septation index was monitored by phase contrast microscopy of live cells during each experiment. In addition, frozen cell pellets were thawed and stained with DAPI and calcofluor to monitor anaphase (“binucleates”) and septation for elutriation A. Cells were scored as “binucleates” if two nuclei were visible, but there was no septum. For the cdc25–22 block release, the prototrophic strain JLP1164 h+ cdc25–22 was grown in filtered YES at 25 °C to OD600 = 0.4 and then used to inoculate 4 × 500 ml filtered YES to an OD600 of 0.1 (flask 1), 0.08 (flask 2), 0.07 (flask 3), and 0.05 (flask 4). Cells were shifted to a water bath at 36.5 °C for 4 h to arrest them in G2 (time = 0 h) and then shifted back to 25 °C rapidly in an ice-water bath (26 °C was achieved in approximately 5 min; cultures did not cool below 25 °C). Samples were taken 10 min apart and harvested with ice as described above. The reference sample for hybridizations was JLP1164 h + cdc25–22 grown at 25 °C to OD600 = 0.2 in filtered YES. Septation index was monitored by phase contrast microscopy. Other microarray experiments To examine cells released synchronously from a cdc10 arrest, 8 l of strain JLP1166 h− cdc10-M17 was grown at 25 °C to OD600 = 0.5 in filtered YES, and then harvested and elutriated to obtain a fraction of G2 cells. These were diluted to 106 cells/ml, shifted to 36.5 °C for 3 h 15 min, rapidly cooled to 25 °C as described above (time = 0), and then sampled with time. Cells were harvested with ice. Samples were also collected and analyzed by flow cytometry to monitor DNA replication. The reference sample for hybridizations was JLP1166 h− cdc10-M17 grown to OD600 0.2 in YES at 25 °C. To examine cells grown in low nitrogen, wild-type h-972 was grown in EMM lacking NH4 and supplemented with 20 mM phenylalanine (EMM-phe) to provide a limiting nitrogen source to expand the G1 window [73]. 8 l of cells were grown at 25 °C to OD600 = 0.4, collected by centrifugation at 4 °C and kept on ice and sonicated on ice. Approximately half of the total cell volume (125 ml, total 2 × 108 cells) was reserved for reference cDNA synthesis and the remainder was elutriated at 4 °C to fractionate the culture into 21 fractions ranging from small cells (50% G1) and then medium cells (G2) and finally to long, septated cells. Fractions were harvested immediately by centrifugation at 4 °C. Fraction assignments were confirmed by flow cytometry analysis and high-quality hybridizations were obtained with fractions 2, 3, 5, 7, 10, 13, and 16. To examine cells arrested at the cdc10, cdc22, cdc25, and nuc2 block points, four strains carrying these cell cycle mutants (cdc22-M45, nuc2–663, cdc25–22, and cdc10-M17) and a wild-type reference control were grown to OD 0.05–0.08 in YES at 25 °C and shifted to 36.5 °C. After 4 h of arrest at this restrictive temperature, a sample was taken for microarray analysis. For each strain, the experiment was repeated with an independent single colony. Figures 6 Microarray hybridization and processing Cell samples for RNA isolation were rapidly cooled by addition of an equal volume of ice (except for elutriation B in which samples were placed on ice) and then collected by centrifugation at 4,000 rpm. (3,300 × g) at 4 °C for 3 min. Pellets were washed twice in ice-cold dH20, frozen in liquid nitrogen, and stored at −70 °C. Total RNA was isolated using RiboPure Yeast (Ambion, Austin, Texas, United States) according to the manufacturer's instructions (elutriation A samples) or hot phenol essentially as described [7] (http://www.sanger.ac.uk/PostGenomics/S_pombe/docs/rnaextraction_website.pdf with slight modifications according to the detailed protocols at http://www.redgreengene.com). Isolated RNA was further purified by RNAeasy cleanup columns (Qiagen, Valencia, California, United States) and quantitated by absorption spectroscopy. Microarray probes were prepared in two steps. First, cDNA was synthesized incorporating aminoallyl-dUTP (aadUTP). Purified aadUTP cDNA was then coupled with Cy3 or Cy5 fluorescent dyes according to protocols from the Institute for Genomic Research (http://www.tigr.org/tdb/microarray/protocolsTIGR.shtml) with slight changes (http://www.redgreengene.com) as follows: 20–25 μg of total RNA was used for cDNA synthesis with 4 μg of oligo-dT primer (not random hexamers), and reactions contained 300 μM aminoallyl-dUTP with 200 μM dTTP. RNA was destroyed using RNase instead of NaOH, and reactions were purified with a Qiagen PCR purification kit. Dye incorporation was determined by absorption spectra and was typically one fluor/20–30 nucleotides. For hybridizations, cDNA with 50 pmol Cy3 plus reference cDNA with 50 pmol Cy5 was included in a 24 μl total hybridization solution (25% v/v formamide, 5× SSC, 0.1% SDS, and 100 μg/ml of sonicated salmon sperm DNA). Hybridizations were performed under 22 × 25 mm lifter cover slips (Erie Scientific, Portsmouth, New Hampshire, United States) at 50 °C in a humidified chamber for 16–20 h. Hybridized arrays were washed by gently shaking as follows: twice briefly with 2× SSC/0.1% SDS (50 °C), twice for 10 min with 2× SSC/0.1% SDS (50 °C), and four times briefly with 0.1× SSC at room temperature. Arrays were dried by centrifugation. Arrays were scanned using an Axon 4000B scanner, controlled by GenePix Pro 5.1 software with a pixel size of 5 μm and two-pass sequential line averaging. Laser power was set to 100%, and PMT gains were subjectively adjusted during prescan to maximize effective dynamic range and to limit image saturation. Lossless image files were stored for later analysis. Data extraction and storage To extract data from microarray scans, previously stored image files representing all hybridizations were analyzed in parallel. Spot size, location, and quality were determined automatically by GenePix Pro algorithms. Dynamic spot resizing between 60 and 150 μm diameter was permitted based upon image examination and prior optimization. Misidentification of spot locations was corrected by manual adjustment of the map prior to automatic sizing and shifting. Only in cases of gross hybridization defect were spots/regions manually moved/resized or flags modified to “bad,” permitting consistent spot calling. Following spot location, parameters and values for each spot were calculated by GenePix Pro and exported. No normalization was applied within GenePix Pro. Raw data and images exported from GenePix Pro were used to populate a local installation of the Longhorn Array Database (Peter Killion, University of Texas at Austin, http://www.longhornarraydatabase.org, an SQL database based upon the Stanford Microarray Database (http://yeastgenome.org, Stanford University). Initial data normalization was performed at the time of population. Briefly, spots were categorized as “pombe” or “other.” Pombe spots were further categorized into “normalization” (no bad, missing, absent, or not-found flags) or “non-normalization” (bad, missing, absent, or not-found flags). Only normalization spots were further considered for the normalization calculation. Finally, spots with greater than 5% saturation in either image channel were discarded from this group. The mean log2 ratio of the median net intensities (Rm, foreground pixel − median of the local background) was calculated. This “normalization factor” represented the distance from a red/green ratio of one, and was used as a scalar modifier for the ratios of all spots in the hybridization; i.e., though only spots meeting a stringent “good” criterion were used to determine the normalization value, this value was subsequently applied to all spots, good or not. During retrieval of data, several further criteria were used to ensure high-quality data in downstream analysis. Only spots with non-negative flag values were retrieved (not bad, missing, absent, or not-found), and only spots with a regression correlation of pixel ratios (a metric of internal spot consistency) greater than 0.6 were used. Spot values were averaged (mean) when multiple independent spots representing a single PCR product were present as internal controls or otherwise. When analyzing multiple hybridizations, such as during time-course analysis, iterative gene and array centering was performed. Briefly, within an array of genes × arrays, the mean log2 ratio of medians (Rm) was calculated and subtracted from each log2 Rm, first along the gene axis, then along the array axis, until subsequent iterations varied by < 0.001%. Normalization There are some special red/green normalization issues relevant to the genome-wide waves of expression (see Figure 6 A second normalization issue is that the oscillation of the strongly regulated genes would have an effect, via normalization, on the apparent expression of non-oscillating genes (i.e., genes that do oscillate would produce an artifactual, complementary oscillation in non-oscillating genes, via normalization). To side-step this artifact, pixel intensity data for the bottom 4,000 genes were extracted from the microarray data before the red/green normalization step, and then normalized and analyzed after extraction, so that the oscillation of the 1,000 most strongly cyclic genes would not interfere with normalization of the least cyclic genes. Cluster analysis For cluster analysis, array- and gene- centered log2 Rm data were hierarchically clustered along the gene axis by the agglomerative algorithm of Eisen et al. [10]. Data were visually presented using JavaTreeView (http://jtreeview.sf.net and http://jtreeview.sourceforge.net/manual.html). Separation of the total dendrogram into subordinate clusters was performed subjectively. Motif analysis To find DNA sequence motifs, nucleotides extending from −1 to the edge of the most 3′ proximal gene (stop or ATG, depending on orientation) with a maximum length of 12,000 bp were extracted genewise for each cluster and used as a target set for motif searching. Three different motif search programs were used: MEME, AlignAce, and SPEXS. MEME (Multiple EM for Motif Elicitation) [15] was used to find motifs between five and nine nucleotides long present in any number of copies on either strand, weighted to find 1/3n to 3n total sites in the target set of n sequences. Parameters were set as follows: $ meme (sequence.name) –dna –minsites (n/3) –maxsites (n*3) –mod anr –minw 5 –maxw 9 –revcomp –nmotifs 10 –evt 0.1 –bfile (fifth order Markov model). (Other parameters were also tried in additional searches.) The top ten motifs exceeding an E-value of 0.1 were generated using a background set consisting of the fifth order Markov model representing possible nucleotide pentuplets in all S. pombe upstream regions. AlignACE [12] uses a Gibbs sampling algorithm. Again, all S. pombe upstream regions were used as a background set. SPEXS (Sequence Pattern EXhaustive Search) [14], a word-search enumeration algorithm, was also used. Relative frequency of 1- to 9-mers was calculated, and compared between the target set and all S. pombe upstream sequences. Identification of oscillating transcripts In general, identification of oscillating transcripts requires a method for finding oscillations in each experiment, and then a method for combining the results from different experiments. Here, we have used Fourier analysis to identify oscillating genes. A p-value for the hypothesis of oscillation was then established using Monte Carlo simulations on shuffled data. The p-values for different experiments were then combined using known statistical properties of the p-value. Finally, the p-values for each gene were ranked. Although we have ranked the p-values, these p-values are nevertheless closely correlated with the amplitude of the oscillation. For each time series of observations in a single time course (e.g., a three cell cycle elutriation experiment), we calculated the Fourier sums A and B over the range of times, t, in the experiment:
Here, t is the time in minutes at which the sample was taken (where the beginning of sampling is zero time); T is the cell cycle period, i.e., the time in minutes required for a complete cell cycle; and ratio(t) is the ratio of experimental to control signal at time t. We considered these two sums as a vector C = (A,B), and then calculated the magnitude of the vector, DO = square root of (A2 + B2). This magnitude, DO, is our basic Fourier measure of whether a transcript oscillates. Note that there is no need to calculate phase. However, random noise would generate some value of D greater than zero, and genes whose transcripts are relatively variable in abundance could generate relatively large values of D, even if these variations had no connection to the cell cycle. Therefore, as a second step, we randomly shuffled the series of observations for each gene in question, and calculated a new magnitude, DR, for the randomized series. This randomization was repeated 1,000 times, generating 1,000 values of DR. These represent the distribution of D for each gene, given that gene's actual variance in gene expression. Finally, we compared the original value of DO from the unshuffled data to the distribution of D from the shuffled data, found how many standard deviations DO is from the mean of the distribution, and in this way calculated a z score for DO. This procedure was repeated for each gene and for each experiment. Thus, for each gene, there were three z scores, one per experiment (two elutriation experiments and the cdc25 block-release experiment). These three z scores were then combined by the method of Stouffer, yielding a single p-value for each gene. Genes were then ranked by p-value with the lowest p-value at the top of the list. In practice, a large amplitude of oscillation contributes tremendously to a low p-value, so the upper portion of the p-value list is almost exclusively occupied by genes with high-amplitude oscillations Gene database In general, we have used the information in the GeneDB database (http://www.genedb.org/genedb/pombe/index.jsp) to describe the various genes studied; when a fact is given in the text about some gene but no reference is given, the information comes from GeneDB. When the primary literature has been consulted directly, the reference for the primary literature is given. Dataset S1: Treeview Files Datasets appropriate for Treeview (see below) are provided as a tar.gz file. Upon opening, this tar.gz file will create a folder containing the three supplementary dataset files S1, S2, and S3 (pombecellcycle.cdt, pombecellcycle.gtr, and pombecellcycle.jtv) along with an additional “treeview_configuration.txt” file detailing the configuration of Treeview to access Sanger GeneDB for S. pombe. These cdt, gtr, and jtv files can then be used to view clusters of the top 750 genes in a convenient and searchable way using Treeview, an open-source cluster visualization package available for many different platforms (http://jtreeview.sourceforge.net). Launch Java Treeview and open file pombecellcycle.cdt. The two supporting files will be accessed automatically (given that they are in the same folder). The pombecellcycle.jtv file is not required and only provides configuration settings. Java Treeview can be instructed to link directly to GeneDB (or any other database) so that a user can quickly check information on any given gene in a cluster of interest. To configure a copy of Java Treeview to link to GeneDB, do the following. First, in “Settings” go to “gene URL presets” and change one of the presets to name GeneDB and template http://www.genedb.org/genedb/Search?organism=pombe&name=HEADER&isid=true and choose this as default. Second, go to Settings “URL settings” and select gene; check that your new template is selected and that the “UID” setting is chosen. Now when an individual gene in a cluster is selected, the GeneDB Web page for that gene should open automatically. Documentation for Java Treeview is available at http://jtreeview.sourceforge.net/manual.html. The cdt file contains log ratios for each gene at each timepoint, and the timepoint names also contain information on septation index (SI) for all three synchrony experiments and binucleates for elutriation A. The form of the header for timepoint names is: [experiment]_[time]min_[%]SI_[%]BN. SI indicates Septation Index (i.e., percent septation), assayed by calcofluor for “ElutA,” phase contrast for “ElutB,” and cdc25. “BN” indicates the binucleate “anaphase index” as measured by DAPI staining, applicable only to ElutA. These headers are displayed in the Zoom window of Java Treeview. (210 KB ZIP). Click here for additional data file.(210K, zip) Figure S1: Oscillation of cdc18 The oscillation of the cdc18 transcript through two cell cycles in elutriation A is plotted as a histogram (right y-axis). Also shown are the binucleate (blue triangles) and septation indices (cyan squares) (left y-axis). (165 KB EPS). Click here for additional data file.(165K, eps) Table S1: Cell Cycle Parameters of S. pombe Genes Data is presented for all 4,988 genes analyzed. Column headings are as follows: “SUID” is the systematic name. “Common-name” is the common name. “Rank” is the rank in our p-value list, from smallest p-values (i.e., most significant genes) to largest p-values. A p-value of “0” means a p-value of less than 10−16. “Desc” is a one-line description of the gene from GeneDB. “Cluster” is the cluster to which the gene belongs, if any. “ElutA_Phase” is the phase angle of the gene calculated from elutriation A. Phase angles range from 0° to 360° (i.e., around a circle). The phase angles of landmark events are as follows: binucleates (anaphase) peak at a phase angle of 238°; septation peaks at a phase angle of 277°; and histone expression (S phase) peaks at a phase angle of about 312. Thus 0° is early in G2, but not the very beginning of G2. “ElutA_Fourier_component” is related to the amplitude of the gene's oscillation. It is the magnitude from the Fourier decomposition of the elutriation A data series. It is the magnitude of only one constituent waveform (the once-per-cell-cycle wave). It is in log2 space. “Combined_P” is the combined p-value obtained by using Stouffer's method to combine the z scores from elutriation A, elutriation B, and the cdc25 block release. “ElutA_Z,” ElutB_Z,” and “cdc25_Z” are the z scores for the elutriation A, elutriation B, and cdc25 block-release experiments, respectively. Z scores were calculated from Monte Carlo simulations (see Materials and Methods). “All names” are all other known synonyms for the gene in question other than the “SUID” and the “Common-name.” In some cells of the spreadsheet, the entry is “#N/A” or “X” or “Z.” These entries indicate that a result was not calculated because of excessive missing data. (2.9 MB XLS). Click here for additional data file.(2.7M, xls) Table S2: SpikeChart Weight Matrices The weight matrices and spike height rules used by SpikeChart to generate Figure 10 (24 KB DOC). Click here for additional data file.(24K, doc) Table S3: S. cerevisiae Homologs of S. pombe Cell Cycle Genes For the top 200 S. pombe cell cycle genes, the best homologs in S. cerevisiae (if any) are shown. If the S. cerevisiae homolog oscillates through the cell cycle, then the time of peak expression is shown in the “Sc peak” column; if the homolog is not known to oscillate, then this column is marked “ND.” Any transcription factors thought to regulate the S. cerevisiae homolog are noted. If there are more than two S. cerevisiae homologs, then all these additional homologs are combined in a single field in the right-most column. (53 KB XLS). Click here for additional data file.(53K, xls) Acknowledgments We thank Rohan Fernandez for help in designing PCR primers for amplification of microarray target sequences, and Michael Sassen for help with PCR amplification. We thank Joe DeRisi for instruction and inspiration in the art of microarray manufacture, and also for providing ArrayMaker2 software. We thank the National Center for Research Resources for funding for S. pombe microarrays (NCRR grant P40RR01632004 to JL) and the National Institutes of Health for research funding (R01GM6481304 and R01GM3997816 to BF). Competing interests. The authors have declared that no competing interests exist. Abbreviations
Footnotes Author contributions. SS, BF, and JL conceived and designed the experiments. AO, AR, FF, and HC performed the experiments. AR, SP, SS, BF, and JL analyzed the data. FF, BF, and JL contributed reagents/materials/analysis tools. AR, BF, and JL wrote the paper. Citation: Oliva A, Rosebrock A, Ferrezuelo F, Pyne S, Chen H, et al. (2005) The cell cycle–regulated genes of Schizosaccharomyces pombe. PLoS Biol 3(7): e225. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
||||||||||||||||||||||||||||||||||
Mol Gen Genet. 1976 Jul 23; 146(2):167-78.
[Mol Gen Genet. 1976]Exp Cell Res. 1971 Dec; 69(2):265-76.
[Exp Cell Res. 1971]Mol Biol Cell. 1998 Dec; 9(12):3273-97.
[Mol Biol Cell. 1998]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Mol Biol Cell. 2005 Mar; 16(3):1026-42.
[Mol Biol Cell. 2005]Mol Biol Cell. 2005 Mar; 16(3):1026-42.
[Mol Biol Cell. 2005]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Mol Biol Cell. 1998 Dec; 9(12):3273-97.
[Mol Biol Cell. 1998]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Mol Biol Cell. 2005 Mar; 16(3):1026-42.
[Mol Biol Cell. 2005]Mol Gen Genet. 1996 Sep 13; 252(3):284-91.
[Mol Gen Genet. 1996]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Mol Biol Cell. 2005 Mar; 16(3):1026-42.
[Mol Biol Cell. 2005]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Mol Biol Cell. 2005 Mar; 16(3):1026-42.
[Mol Biol Cell. 2005]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Mol Biol Cell. 2005 Mar; 16(3):1026-42.
[Mol Biol Cell. 2005]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]J Mol Biol. 2000 Mar 10; 296(5):1205-14.
[J Mol Biol. 2000]Microbes Infect. 2001 Aug; 3(10):823-9.
[Microbes Infect. 2001]Proc Int Conf Intell Syst Mol Biol. 2000; 8():384-94.
[Proc Int Conf Intell Syst Mol Biol. 2000]Proc Int Conf Intell Syst Mol Biol. 1995; 3():21-9.
[Proc Int Conf Intell Syst Mol Biol. 1995]J Comput Biol. 1998 Summer; 5(2):211-21.
[J Comput Biol. 1998]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Mol Biol Cell. 2005 Mar; 16(3):1026-42.
[Mol Biol Cell. 2005]Eukaryot Cell. 2004 Aug; 3(4):944-54.
[Eukaryot Cell. 2004]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]EMBO J. 2002 Nov 1; 21(21):5745-55.
[EMBO J. 2002]J Cell Sci. 2004 Nov 1; 117(Pt 23):5623-32.
[J Cell Sci. 2004]EMBO J. 1992 Dec; 11(13):4923-32.
[EMBO J. 1992]Mol Gen Genet. 1992 Sep; 234(3):449-56.
[Mol Gen Genet. 1992]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Science. 1993 Sep 17; 261(5128):1551-7.
[Science. 1993]Curr Opin Cell Biol. 1994 Jun; 6(3):451-9.
[Curr Opin Cell Biol. 1994]EMBO J. 1992 Dec; 11(13):4923-32.
[EMBO J. 1992]Mol Gen Genet. 1992 Sep; 234(3):449-56.
[Mol Gen Genet. 1992]Mol Cell Biol. 1995 May; 15(5):2589-99.
[Mol Cell Biol. 1995]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]J Cell Sci. 2003 May 1; 116(Pt 9):1689-98.
[J Cell Sci. 2003]Mol Biol Cell. 2004 Aug; 15(8):3903-14.
[Mol Biol Cell. 2004]Mol Biol Cell. 1998 Dec; 9(12):3273-97.
[Mol Biol Cell. 1998]EMBO J. 1985 Dec 16; 4(13A):3531-8.
[EMBO J. 1985]Genetics. 1993 Sep; 135(1):25-34.
[Genetics. 1993]Mol Cell Biol. 1997 Feb; 17(2):545-52.
[Mol Cell Biol. 1997]Mol Cell Biol. 1987 Feb; 7(2):614-21.
[Mol Cell Biol. 1987]Genome Biol. 2004; 5(8):R56.
[Genome Biol. 2004]J Cell Sci. 1985 Apr; 75():357-76.
[J Cell Sci. 1985]J Cell Sci. 1996 Dec; 109 ( Pt 12)():2947-57.
[J Cell Sci. 1996]Genes Dev. 2004 Oct 15; 18(20):2491-505.
[Genes Dev. 2004]Yeast. 2003 Mar; 20(4):351-67.
[Yeast. 2003]Mol Biol Cell. 1998 Dec; 9(12):3273-97.
[Mol Biol Cell. 1998]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Mol Biol Cell. 2005 Mar; 16(3):1026-42.
[Mol Biol Cell. 2005]J Cell Sci. 1996 Dec; 109 ( Pt 12)():2947-57.
[J Cell Sci. 1996]J Cell Biol. 1965 Dec; 27(3):565-74.
[J Cell Biol. 1965]Mol Cell Biol. 1997 Oct; 17(10):5791-802.
[Mol Cell Biol. 1997]Mol Gen Genet. 1983; 192(1-2):204-11.
[Mol Gen Genet. 1983]J Cell Sci. 1979 Feb; 35():25-40.
[J Cell Sci. 1979]Mol Biol Cell. 2005 Mar; 16(3):1026-42.
[Mol Biol Cell. 2005]Mol Cell Biol. 2004 Jun; 24(12):5534-47.
[Mol Cell Biol. 2004]EMBO J. 1998 Oct 1; 17(19):5689-98.
[EMBO J. 1998]Mol Cell. 2002 Feb; 9(2):279-89.
[Mol Cell. 2002]Dev Biol. 2002 Oct 1; 250(1):1-23.
[Dev Biol. 2002]Biochemistry. 2000 Apr 11; 39(14):3943-54.
[Biochemistry. 2000]Cell. 1999 Apr 30; 97(3):299-311.
[Cell. 1999]Cell. 2001 Sep 21; 106(6):697-708.
[Cell. 2001]Nature. 2000 Jul 6; 406(6791):90-4.
[Nature. 2000]Mol Biol Cell. 1998 Dec; 9(12):3273-97.
[Mol Biol Cell. 1998]Cell. 2001 Sep 21; 106(6):697-708.
[Cell. 2001]J Cell Sci. 2004 Nov 1; 117(Pt 23):5623-32.
[J Cell Sci. 2004]EMBO J. 2004 Dec 8; 23(24):4709-16.
[EMBO J. 2004]Trends Biochem Sci. 2004 Aug; 29(8):409-17.
[Trends Biochem Sci. 2004]Genes Dev. 1999 Mar 15; 13(6):666-74.
[Genes Dev. 1999]Biochemistry. 2000 Apr 11; 39(14):3943-54.
[Biochemistry. 2000]J Cell Sci. 1999 Mar; 112 ( Pt 6)():939-46.
[J Cell Sci. 1999]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Mol Cell Biol. 1999 Nov; 19(11):7357-68.
[Mol Cell Biol. 1999]Proc Int Conf Intell Syst Mol Biol. 1995; 3():21-9.
[Proc Int Conf Intell Syst Mol Biol. 1995]J Mol Biol. 2000 Mar 10; 296(5):1205-14.
[J Mol Biol. 2000]Proc Int Conf Intell Syst Mol Biol. 2000; 8():384-94.
[Proc Int Conf Intell Syst Mol Biol. 2000]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Mol Biol Cell. 2005 Mar; 16(3):1026-42.
[Mol Biol Cell. 2005]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Mol Biol Cell. 2005 Mar; 16(3):1026-42.
[Mol Biol Cell. 2005]Nat Genet. 2004 Aug; 36(8):809-17.
[Nat Genet. 2004]Genes Dev. 2004 Oct 15; 18(20):2491-505.
[Genes Dev. 2004]Nature. 2002 Jun 27; 417(6892):967-70.
[Nature. 2002]Mol Biol Cell. 1998 Dec; 9(12):3273-97.
[Mol Biol Cell. 1998]