![]() | ![]() |
Formats:
|
||||||||||
Copyright © 2006, Cold Spring Harbor Laboratory Press Genome-wide identification of replication origins in yeast by comparative genomics Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, Scotland, United Kingdom 1Corresponding author. E-MAIL a.d.donaldson/at/abdn.ac.uk; FAX 44-1224-555844. Received March 3, 2006; Accepted May 10, 2006. This article has been cited by other articles in PMC.Abstract We discovered that sequences essential for replication origin function are frequently conserved in sensu stricto Saccharomyces species. Here we use analysis of phylogenetic conservation to identify replication origin sequences throughout the Saccharomyces cerevisiae genome at base pair resolution. Origin activity was confirmed for each of 228 predicted sites—representing 86% of apparent origin regions. This is the first study to determine the genome-wide location of replication origins at a resolution sufficient to identify the sequence elements bound by replication proteins. Our results demonstrate that phylogenetic conservation can be used to identify the origin sequences responsible for replicating a eukaryotic genome. Keywords: ACS, ARS, ORC, Saccharomyces, sensu stricto, yeast Eukaryotic chromosomes are replicated from multiple discrete sites called replication origins, each of which initiates two diverging replication forks. Replication origins are best understood in the budding yeast Saccharomyces cerevisiae, where the ability of origin sequences to support plasmid maintenance provided a convenient assay for origin function, leading to their original designation as ARS elements or autonomously replicating sequences. The few S. cerevisiae origins that have been investigated at the sequence level are almost all intergenic and consist of ~200-base-pair (bp) sequences containing an essential ARS consensus sequence (ACS) and several nonessential secondary “B” elements (Shirahige et al. 1993; Weinreich et al. 2004). The ACS has been determined by alignment of known essential elements to consist of an 11-bp motif (T/A)TTTAT(A/G)TTT(T/A), sometimes represented as an extended 17-bp motif based on a larger number of origins (Theis and Newlon 1997). Most origins contain multiple imperfect matches to this motif with the best match not necessarily corresponding to the essential ACS. Since there are >12,000 potential ACS matches in the genome but ~400 origins, the motif cannot be used to predict origin location. A match to the ACS is essential but not sufficient for origin function, indicating that there are additional sequences and/or chromatin requirements that are at present not understood. Microarray-based studies have used two approaches to map the approximate location of replication origins. First, by determining the replication time of all genomic sequences and taking advantage of the fact that replication origins are locally the earliest replicating sequences, it has been possible to identify regions with origin activity in S. cerevisiae, Drosophila, and human cultured cells (Raghuraman et al. 2001; Yabuki et al. 2002; MacAlpine et al. 2004; Jeon et al. 2005). Second, chromatin immunoprecipitation (ChIP) of origin-binding factors (ORC [origin recognition complex] and Mcm2-7) facilitated identification of replication origin regions in S. cerevisiae and Drosophila (Wyrick et al. 2001; MacAlpine et al. 2004). However, the fairly low resolution of these studies combined with the degeneracy (in S. cerevisiae) or absence (in Drosophila) of known origin motifs has so far precluded the precise identification of replication origin sequences genome-wide (MacAlpine and Bell 2005). A computational study attempted to assign precise S. cerevisiae origin locations by searching for ACS matches that conform to an extended matrix (Breier et al. 2004). This approach worked better than previous computational attempts; however, the algorithm used made multiple assignments at some locations but none at all at the majority of experimentally determined origin regions (MacAlpine and Bell 2005), and therefore this study was not used for genome annotation. Here we demonstrate that most replication origin sequences are phylogenetically conserved among closely related Saccharomyces species, and we use this conservation to permit genome-wide identification of the DNA sequences responsible for replicating S. cerevisiae chromosomes. Results and Discussion Evolutionary conservation of replication origin sequences We examined evolutionary conservation at ~20 previously characterized S. cerevisiae replication origins, comparing them with the corresponding sequences from four closely related sensu stricto Saccharomyces species (Saccharomyces paradoxus, Saccharomyces mikatae, Saccharomyces kudriavzevii, and Saccharomyces bayanus) (Cliften et al. 2003; Kellis et al. 2003). Sequences important for origin function were frequently conserved in the related species, particularly the essential ACS element that is bound by ORC (Fig. (Fig.1A;1A
We used this evolutionary conservation to identify the critical origin consensus elements throughout the S. cerevisiae genome. ACS matches were scored on three criteria: phylogenetic conservation across the five Saccharomyces species (see Materials and Methods), proximity to an origin region (as identified in array-based studies) (Raghuraman et al. 2001; Wyrick et al. 2001; Yabuki et al. 2002), and similarity to our ACS motif (Fig. (Fig.2A).2A
Confirmation of origin activity To confirm our identified origin sites and extend this analysis genome-wide, we developed a high-throughput transformation assay for ARS activity (Fig. (Fig.3;3
ACS elements of previously identified origins were not used in determining proACS locations and therefore can be used to assess the accuracy of our predictions. At 12 of the 13 origins for which the essential ACS has previously been assigned, our predictions were in precise agreement with experimental data. The exception is ARS121 (ARSX-684), where our proACS overlaps but is distinct from the reported ACS (Walker et al. 1990). We therefore performed linker scan analysis at ARS121 and showed that the conserved proACS is essential whereas the previously identified ACS is not (Supplementary Fig. S6). We mutated a further 20 proACS elements (examples in Fig. Fig.3C),3C Properties of replication origin sequences An ACS is essential but not sufficient for origin activity, and therefore, the sequence context of the ACS is important for its function. Reported contextual features such as flanking B sequence elements also contribute to origin activity. Several origins contain a B2 element of unknown molecular function that, intriguingly, shows sequence resemblance to the ACS. At ARS305 (Lin and Kowalski 1997) and ARS307 (Rao et al. 1994; Theis and Newlon 1994), the B2 elements are phylogenetically conserved (Fig. (Fig.1A).1A A second reported property of replication origins is a region of helical instability, thought to facilitate DNA unwinding (Natale et al. 1993). Sequences from the various sensu stricto species that flank a phylogenetically conserved ACS element share the property of helical instability despite lacking primary sequence conservation (data not shown). Therefore, origins show evolutionary conservation of both primary sequence elements and secondary sequence characteristics. This study represents the first precise location of replication origin sequences across a eukaryotic genome. It therefore provides the first opportunity to examine the chromosomal context of replication origins genome-wide. We looked at the relationship between replication origins and transcription units. We observed a bias toward occurrence of replication origins in large intergenic spaces and between convergent transcription units (P = 0.00003) (Supplementary Table S4). Thirty-six percent of origins identified lie in convergent intergenic spaces, although only 22% of intergenic spaces are convergent. Where replication origins lie between tandem transcription units, we observed a bias for origins to lie closer to transcriptional terminators than promoters (69% of origins identified between tandem transcription units; P = 0.0003). Together these data suggest either exclusion of origins from promoters (perhaps due to the presence of transcription factors) and/or enrichment in terminator regions, possibly because termination zones tend to have greater helical instability than promoters (Benham 1996). Next we examined nucleosome occupancy across replication origins. It had been reported that nucleosomes are excluded from the ACS and B elements of ARS1 and ARS307 (Lipford and Bell 2001). Accurate nucleosome positioning data are available at eight additional origins for which we have ACS (or proACS) locations (Supplementary Fig. S8; Yuan et al. 2005). At each origin a nucleosome-free region is bounded on one side by the ACS; the probability of 10 ACS elements lying in a nucleosome-free region by chance is 9.8 × 10−4. Finally, we examined the modification status of nucleosomes close to replication origins. To investigate whether nucleosomes surrounding origins show specific modification patterns, we analyzed the results of a whole-genome study of histone modification (Pokholok et al. 2005). These data confirmed that origins are located in intergenic regions with low levels of H4 N-terminal acetylation (Vogelauer et al. 2002); in addition, these data suggest an even stronger tendency for the nucleosomes surrounding origins to have low levels of H3K79 trimethylation (Supplementary Fig. S9). Not all histone covalent modifications are reduced close to origins, since origins were randomly distributed among intergenic sequences ranked according to H3K4 monomethylation. It has been suggested that H3 or H4 acetylation state affects the time of origin initiation (Vogelauer et al. 2002); surprisingly, we observed no correlation between origin initiation time and any of the chromatin modifications examined by Pokholok et al. (2005) (Supplementary Fig. S10); neither did any modification relate to whether an origin initiates prior to the hydroxyurea-induced S-phase checkpoint. To summarize, we have precisely assigned the location of the majority of S. cerevisiae replication origins and shown that sequences identified have ARS activity. For each origin we have proposed the essential 15-bp sequence element proACS. This analysis represents a increase in resolution over previous studies of approximately three orders of magnitude. Our data set unambiguously assigns the intergenic space occupied by each origin and reveals that replication origin sequences tend to fall close to transcriptional terminators. Knowledge of the precise location of replication origin sequences throughout a eukaryotic genome provides a valuable resource, making available an increased repertoire of origins for reductionist studies, and facilitating systems biology approaches to improve our understanding of DNA replication. All of the origins identified in this study lie in intergenic regions. The very high levels of phylogenetic conservation in ORFs mask the lower levels of conservation observed at ACS elements and therefore prevent the identification of origin sequence elements that may lie within genes. To date only two origins have been identified within ORFs. These are ARS604, which lies within a transcribed gene but is chromosomally inactive for replication initiation, and ARS605, which is chromosomally active during mitotic growth but lies within a gene that is expressed only in meiosis (Hirschman et al. 2006). The fact that no cases are known where active origins coincide with active transcription, combined with the fact we have identified intergenic origin sequences for the majority of active origins, implies that very few origins lie within transcription units. ACS elements show lower levels of phylogenetic conservation than genes, presumably because origins have higher levels of functional redundancy; several origins can be deleted from a chromosome without deleterious consequences (Dershowitz and Newlon 1993). The tendency for ACS elements and helical instability to be more evolutionarily conserved than the intergenic average implies that some selective pressure to retain origin sites does exist. We fail to observe phylogenetic conservation at certain replication origins (e.g., ARS1) (Supplementary Fig. S1). This observation may reflect genuine differences in some replication initiation sites between sensu stricto species; however in some cases, our analysis was limited by incomplete sequence data in the related yeast species. In other cases, failure to identify phylogenetic conservation may result from technical difficulties in aligning the genomic sequences. The yeast genome contains ~12,000 matches to the ACS motif, yet only ~400 are functional. Whether a particular match to the ACS behaves as an origin may be determined by several other contributing factors—specifically, the ease of unwinding of DNA in the B region 3′ to the ACS, the presence of a nucleosome-excluding element flanking the B region (possibly a transcription factor-binding site or a run of T or A residues), and a surrounding chromatin conformation that is favorable for origin function. While no single one of these contributing properties may be essential for origin activity, together these features may determine whether a particular ACS motif can function as a replication origin. We have shown that comparative genomics can be combined with origin location data to discover essential origin sequences. Using this approach, we have precisely identified replication origin sites and confirmed their activity in vivo throughout the S. cerevisiae genome. Our list of precise origin locations permitted the first whole-genome analysis of chromatin conformation close to replication origins revealing global properties of replication initiation sites. Determining the sequence requirements for DNA replication origins in metazoans has proven elusive, in part hampered by the lack of assays analogous to the yeast ARS assay. Recent studies have greatly increased our knowledge of the approximate location of origins in several metazoans (MacAlpine et al. 2004; Jeon et al. 2005). Combining these results with comparative genomics may offer the first possibility to identify sequence elements or characteristics that regulate metazoan origin selection. Materials and methods Comparative genomics Evolutionary conservation was assessed and scored using the University of California at Santa Cruz (UCSC) Genome Browser (http://genome. ucsc.edu; Bejerano et al. 2005). The custom track facility was used to annotate the Saccharomyces genomes with origin location data and extract phylogenetically conserved intergenic sequences. These sequences were assessed for motifs using MEME (http://meme.sdsc.edu; Bailey and Elkan 1994) and were visualized using WebLogo (Crooks et al. 2004). ACS identification Proposed ACS sites were identified on the basis of (1) proximity to an origin identified by microarray studies, (2) similarity to our ACS motif, and (3) phylogenetic sequence conservation. ACS matches in the vicinity of microarray-identified origins (<1.5 kb from a proARS (Wyrick et al. 2001); <3 kb from a copy number-derived origin) (Yabuki et al. 2002) were scored as potentially positive. The data from the ChIP microarray study (Wyrick et al. 2001) were reassessed using lower thresholds to allow identification of origins (e.g., ARS306) that fell just below the published thresholds. Occurrences of the ACS motif in the vicinity of microarray-identified origins were scored using MAST (http://meme.sdsc.edu; Supplementary Table S1; Bailey and Elkan 1994), and the highest-scoring occurrences that also shared phylogenetic conservation were tested for ARS activity. Phylogenetic conservation was assessed using alignments from either the UCSC Genome Browser or the Saccharomyces Genome Database and scored as potentially positive if either 12 out of 15 ACS bases were identical between S. cerevisiae and at least one other sensu stricto species, or the UCSC Genome Browser (phastCons table) annotated the ACS with a phylogenetic conservation score >0.01 (Supplementary Table S1). Because phylogenetic sequence data are unavailable at some locations and a minority of ACS may not be phylogenetically conserved, we tested 21 ACS occurrences for which there was no observed conservation but that scored highly on the other two criteria; these had ARS activity and are included in Supplementary Tables S1 and S2. Yeast techniques All ARS assays were performed in the S. cerevisiae strain Y00000 (MATa his3− leu2− met15− ura3−). Candidate ARS fragments were PCR-amplified using oligonucleotides designed to include 50–100 bp of sequence 5′ of the proACS and 150–200 bp of sequence 3′ of the proACS. PCR fragments were ligated into the vector pGEM-T (Promega). The high-throughput recombination-based ARS assay is summarized in Supplementary Figure S3 and will be described in detail elsewhere. Mutagenesis of proACS sites (shown in Fig. Fig.2;2 Data access Our list of origins is being deposited with the Saccharomyces Genome Database and the UCSC Genome Browser to allow genome annotation with origin elements. The custom track facility of the UCSC Genome Browser allows comparison of our data set (available at http://www.oridb.org) with any other S. cerevisiae genome-wide data set. Acknowledgments We thank John Diffley for the initial suggestion of examining phylogenetic conservation at replication origins, and Shin-ichiro Hiraga for helpful discussion and advice. A.D.D. is a Royal Society University Research Fellow. Footnotes Supplemental material is available at http://www.genesdev.org. Article is online at http://www.genesdev.org/cgi/doi/10.1101/gad.385306 References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||
Mol Cell Biol. 1993 Aug; 13(8):5043-56.
[Mol Cell Biol. 1993]Biochim Biophys Acta. 2004 Mar 15; 1677(1-3):142-57.
[Biochim Biophys Acta. 2004]Proc Natl Acad Sci U S A. 1997 Sep 30; 94(20):10786-91.
[Proc Natl Acad Sci U S A. 1997]Science. 2001 Oct 5; 294(5540):115-21.
[Science. 2001]Genes Cells. 2002 Aug; 7(8):781-9.
[Genes Cells. 2002]Genes Dev. 2004 Dec 15; 18(24):3094-105.
[Genes Dev. 2004]Proc Natl Acad Sci U S A. 2005 May 3; 102(18):6419-24.
[Proc Natl Acad Sci U S A. 2005]Science. 2001 Dec 14; 294(5550):2357-60.
[Science. 2001]Science. 2003 Jul 4; 301(5629):71-6.
[Science. 2003]Nature. 2003 May 15; 423(6937):241-54.
[Nature. 2003]Genetics. 1999 Jul; 152(3):943-52.
[Genetics. 1999]Genome Biol. 2004; 5(4):R22.
[Genome Biol. 2004]Nucleic Acids Res. 2005; 33(8):2410-20.
[Nucleic Acids Res. 2005]Science. 2001 Oct 5; 294(5540):115-21.
[Science. 2001]Science. 2001 Dec 14; 294(5550):2357-60.
[Science. 2001]Genes Cells. 2002 Aug; 7(8):781-9.
[Genes Cells. 2002]Genome Biol. 2004; 5(4):R22.
[Genome Biol. 2004]Science. 2001 Dec 14; 294(5550):2357-60.
[Science. 2001]Proc Natl Acad Sci U S A. 1990 Jun; 87(12):4665-9.
[Proc Natl Acad Sci U S A. 1990]Mol Cell Biol. 2001 Apr; 21(8):2790-801.
[Mol Cell Biol. 2001]Mol Cell Biol. 1997 Sep; 17(9):5473-84.
[Mol Cell Biol. 1997]Mol Cell Biol. 1994 Nov; 14(11):7643-51.
[Mol Cell Biol. 1994]Mol Cell Biol. 1994 Nov; 14(11):7652-9.
[Mol Cell Biol. 1994]Nucleic Acids Res. 1993 Feb 11; 21(3):555-60.
[Nucleic Acids Res. 1993]J Mol Biol. 1996 Jan 26; 255(3):425-34.
[J Mol Biol. 1996]Mol Cell. 2001 Jan; 7(1):21-30.
[Mol Cell. 2001]Science. 2005 Jul 22; 309(5734):626-30.
[Science. 2005]Cell. 2005 Aug 26; 122(4):517-27.
[Cell. 2005]Mol Cell. 2002 Nov; 10(5):1223-33.
[Mol Cell. 2002]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D442-5.
[Nucleic Acids Res. 2006]Mol Cell Biol. 1993 Jan; 13(1):391-8.
[Mol Cell Biol. 1993]Genes Dev. 2004 Dec 15; 18(24):3094-105.
[Genes Dev. 2004]Proc Natl Acad Sci U S A. 2005 May 3; 102(18):6419-24.
[Proc Natl Acad Sci U S A. 2005]Nat Methods. 2005 Jul; 2(7):535-45.
[Nat Methods. 2005]Proc Int Conf Intell Syst Mol Biol. 1994; 2():28-36.
[Proc Int Conf Intell Syst Mol Biol. 1994]Genome Res. 2004 Jun; 14(6):1188-90.
[Genome Res. 2004]Genes Cells. 2002 Aug; 7(8):781-9.
[Genes Cells. 2002]Science. 2001 Dec 14; 294(5550):2357-60.
[Science. 2001]Proc Int Conf Intell Syst Mol Biol. 1994; 2():28-36.
[Proc Int Conf Intell Syst Mol Biol. 1994]Mol Cell Biol. 1997 Sep; 17(9):5473-84.
[Mol Cell Biol. 1997]Nat Methods. 2005 Jul; 2(7):535-45.
[Nat Methods. 2005]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D442-5.
[Nucleic Acids Res. 2006]Science. 2001 Dec 14; 294(5550):2357-60.
[Science. 2001]Genes Cells. 2002 Aug; 7(8):781-9.
[Genes Cells. 2002]Science. 2001 Oct 5; 294(5540):115-21.
[Science. 2001]Genes Cells. 2002 Aug; 7(8):781-9.
[Genes Cells. 2002]Science. 2001 Dec 14; 294(5550):2357-60.
[Science. 2001]Science. 2001 Oct 5; 294(5540):115-21.
[Science. 2001]Science. 2001 Dec 14; 294(5550):2357-60.
[Science. 2001]Genes Cells. 2002 Aug; 7(8):781-9.
[Genes Cells. 2002]