![]() | ![]() |
Formats:
|
||||||||||||
Shotgun bisulfite sequencing of the Arabidopsis genome reveals DNA methylation patterning 1Department of Molecular, Cell, and Developmental Biology, University of California at Los Angeles, Los Angeles, California 90095, USA 2Howard Hughes Medical Institute, University of California at Los Angeles, Los Angeles, California 90095, USA 3Department of Human Genetics, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California 90095, USA 4Illumina Inc., Hayward, California 94545, USA 5New England BioLabs, Ipswich, Massachusetts 01938, USA Author Information. Reprints and permissions information is available at www.nature.com/reprints. The authors declare competing financial interests: details accompany the full-text HTML version of the paper at www.nature.com/nature. Correspondence and requests for materials should be addressed to S.E.J. (Email: jacobsen/at/ucla.edu) or M.P. (Email: matteop/at/mcdb.ucla.edu) 6These authors contributed equally to this work. 7Present address: Department of Plant Biology, University of Georgia, Athens, Georgia 30602, USA. Author Contributions. S.J.C. developed computational methods for mapping and basecalling. S.F. designed and created DNA libraries and performed all molecular biology experiments. S.F., Z.C., B.M., and S.F.N. sequenced libraries. M.P., S.J.C., S.F., and S.E.J. analyzed data. S.E.J. and M.P. designed and directed the study. X.Z., C.D.H., and S.P. assisted in the design of experiments. S.F. and S.J.C. wrote the manuscript. Abstract Cytosine DNA methylation is important in regulating gene expression and in silencing transposons and other repetitive sequences 1, 2. Recent genomic studies in Arabidopsis have revealed that many endogenous genes are methylated either within their promoters or within their transcribed regions, and that gene methylation is highly correlated with transcription levels 3-5. However, plants have different types of methylation controlled by different genetic pathways, and detailed information on the methylation status of each cytosine in any given genome is lacking. To this end, we generated a map at single base pair resolution of methylated cytosines for Arabidopsis, by combining bisulfite treatment of genomic DNA with ultra-high-throughput sequencing using the Illumina 1G Genome Analyzer and Solexa sequencing technology 6. This approach, termed BS-Seq, unlike previous microarray-based methods, allows one to sensitively measure cytosine methylation on a genome-wide scale within specific sequence contexts. We describe methylation on previously inaccessible components of the genome along with an analysis of the DNA methylation sequence composition and distribution. We also describe the effect of various DNA methylation mutants on genome-wide methylation patterns, and demonstrate that our newly developed library construction and computational methods can be applied to large genomes such as mouse. To generate a DNA methylation map at one nucleotide resolution across the genome, we adapted the Illumina 1G Genome Analyzer using Solexa sequencing technology (Illumina GA) for shotgun sequencing of bisulfite-treated Arabidopsis genomic DNA. Sodium bisulfite converts unmethylated cytosines to uracils, but 5-methylcytosines remain unconverted. Hence, after polymerase chain reaction amplification, unmethylated cytosines appear as thymines and methylated cytosines appear as cytosines 7. We created genomic DNA libraries after bisulfite conversion and produced ~3.8 billion nucleotides of high quality sequence which successfully mapped to the genome. We subsequently used several filters to ensure accuracy, including only retaining reads mapping to sequences that are unique in the genome after bisulfite conversion from every possible methylation pattern (see Supplementary Methods and Supplementary Table 1). This resulted in a conservative dataset of ~2.6 billion nucleotides mapping to unique genomic locations with very high confidence, covering ~93% of all cytosines which could theoretically be covered (~92% of the ~43 million cytosines in the ~120 Mbp Arabidopsis genome can be covered uniquely with 31 nucleotide sequences). This represents ~20-fold average coverage, similar to typical coverage in a traditional bisulfite sequencing experiment for a single locus. Methylation in Arabidopsis exists in three sequence contexts, CG, CHG (where H = A, C, or T), and asymmetric CHH 1. We observed overall genome-wide levels of 24% CG, 6.7% CHG, and 1.7% CHH methylation (Supplementary Fig. 1a). Most CGs were either unmethylated or were highly methylated (80–100%), whereas CHH sites were either unmethylated or methylated at ~10%. CHG sites showed a more uniform distribution between 20–100% (Supplementary Fig. 1b-d). These differences underscore the fact that each type of methylation is under distinct genetic control 1. Our reads also contained 504-fold average coverage of 99.97% of theoretically-coverable cytosines in the unmethylated chloroplast genome 3, 8, giving false positive rates of 0.29% (CG), 0.29% (CHG), and 0.25% (CHH) (Supplementary Fig. 1a, Supplementary Fig. 2). The BS-Seq data were highly consistent with traditional bisulfite sequencing data from individual methylated or unmethylated loci 3 (Supplementary Table 2, Supplementary Fig. 3, and below). While CG, CHG, and CHH methylation were highly correlated, showing enrichment in repeat-rich pericentromeric regions (Fig. 1a
BS-Seq appears to be more sensitive than previously-employed microarray-based methods 3-5. For example, we found a cluster of 5 methylated CG sites in a 34 base pair region and a lone methylated CG site, both within the FWA locus, that were not detected by previous methods (Supplementary Fig. 4). We also found CG methylation within genes previously classified as unmethylated 3, 4 (Supplementary Fig. 5). Finally, in analyzing genes whose expression is de-repressed in DNA methyltransferase mutants, BS-Seq was more accurate in identifying genes with promoter methylation that was otherwise variably detected in previous microarray studies (Supplementary Fig. 6). BS-Seq can be used to analyze repetitive sequences that are difficult to study with microarrays as they may exceed the dynamic detection range or cross-hybridize. For example, we mapped methylation across the highly repetitive rDNA loci and found high levels of CG, CHG, and CHH methylation, including on the minimal promoter and upstream Sal1 repeats (Supplementary Fig. 7). Further, we detected methylation in telomeric repeat sequences (CCCTAAA)n which have not been previously shown to be methylated (Fig. 1e The single base resolution of BS-Seq allows determination of the precise boundaries between methylated and unmethylated regions. For example, we found that the boundary between tandem repeats and flanking DNA showed a sharp drop in methylation, but DNA methylation extended from inverted repeats into flanking DNA, showing a more gradual reduction (Fig. 1b We analyzed the relationship between sequence context and preference of methylation. We calculated the percent methylation of all possible 7-mer sequences in which the methylated cytosine was either in the fifth position (allowing an analysis of four nucleotides upstream of CG, CHG, and CHH methylation; Fig. 2
We used autocorrelation analysis to examine the correlation between methylation in different sequence contexts and methylation at adjacent residues. We observed significant correlation between methylated cytosines for distances up to 5,000 nucleotides or more, a likely reflection of regional foci of methylation throughout the genome and of large blocks of pericentromeric heterochromatin (Supplementary Fig. 11, Supplementary Table 5). We also found a high correlation of CHG and CHH methylation within several nucleotides downstream of methylated CG sites, and a tendency for CHH methylation four nucleotides downstream of methylation at CHG sites (Fig. 2 We analyzed the propensity for full methylation of the strand-symmetrical CG and partially symmetrical CHG sequences. As expected, CG methylation on one strand was highly correlated with CG methylation on the opposing strand. We also saw a high correlation for CHG methylation of the two strands, showing that, like CG methylation, CHG sites show a strong tendency for symmetrical methylation (Supplementary Fig. 12). Unexpectedly, we observed a correlation between CHH methylation on one strand, and methylation at the cytosine three nucleotides downstream and on the opposite strand (Supplementary Fig. 12, Supplementary Table 5). Since the sequence of such sites is CHHG, this shows that “asymmetric” methylation shows a propensity for symmetrical methylation at these sites, even though methylation on CHHG sites is not particularly prominent in the genome (Supplementary Fig. 8, Supplementary Table 4). Autocorrelation analysis also revealed a striking periodicity of 10 nucleotides (the length of one helical DNA turn) for CHH methylation (Fig. 3a, b
Autocorrelation also showed a period of 167 nucleotides (Fig. 3c We utilized BS-Seq to study the genome-wide effects of a variety of methyltransferase mutants on DNA methylation (Fig. 4
The BS-Seq procedure described here should be generally useful in other organisms. For example we applied BS-Seq to quantify the overall genomic methylation difference between wild type mouse embryonic stem cells and cells carrying a mutation in the UHRF1 gene recently shown to control maintenance of CG methylation 23, 24. By analyzing ~60 million nucleotides of shotgun sequencing data from each, we found that Uhrf1−/− cells contained only 25% of the CpG methylation level of wild type (Fig. 4c In summary, BS-Seq analysis of wild type and methyltransferase mutants has allowed a more detailed characterization of the Arabidopsis methylome. In addition, the computational approaches developed in this study should be generally useful for other short read sequencing genomics approaches. An installation of the UCSC browser allowing community access to detailed methylation patterns of individual genes and a source code distribution of the computational methods are available at http://epigenomics.mcdb.ucla.edu/BS-Seq/. METHODS SUMMARY Construction and sequencing of DNA libraries Bisulfite treatment of DNA was performed as previously described 25, except that adaptor sequences and PCR conditions were modified and optimized for this study. Library generation and ultra-high-throughput sequencing were carried out according to manufacturer instructions (Illumina). Processing of sequence data and mapping of reads Raw data from Illumina GA was processed using the initial stages of the Solexa software pipeline (Illumina) into short reads, except that per-lane per-cycle multidimensional Gaussian mixture models (GMMs) were developed to optimize base call A–vs.–C–vs.–G–vs.–T probability distribution accuracies at each sequenced base compared to the Solexa software pipeline’s _
prb files. Sequenced reads were mapped to reference genomes fully using per-base probabilities from the GMMs using highly-optimized novel C++ tools. Sequences that mapped to more than one position with similar scores (within 1% of the maximum likelihood mapping) were removed in order to retain only reads that map uniquely. To eliminate unconverted bisulfite reads, a filter discarded reads with three or more consecutive methylated cytosines when each of these was in a CHH context, resulting in a loss of ~0.23% of reads. This filter was effective and with only minimal loss of true CHH methylation (Supplementary Table 1, Supplementary Fig. 13, 17, and 18). Validation of BS-Seq results Traditional bisulfite sequencing was employed to validate BS-Seq results at select loci (Supplementary Table 2, Supplementary Fig. 4, 6, 17). The PCR primers used in validation are listed in Supplementary Table 7. Full methods (including extensive algorithmic and computational details) are available in the Supplementary Methods section at www.nature.com/nature. Supplementary Information is linked to the online version of the paper at www.nature.com/nature. Click here to view.(2.1M, pdf) Acknowledgments We thank Yana Bernatavichute for nuclear DNA isolation protocols, Amander Clarke for providing ES cell DNA, Angelique Girard and Greg Hannon for providing mouse germ cell DNA, Jonathan Hetzel for technical assistance, and Carey Fey Li for assistance with rDNA annotation. This work was supported in part by a grant from the NSF Plant Genome Research Program (award number 0701745) and some aspects of the work were performed in the UCLA DNA Microarray Facility. S.F. is a Howard Hughes Medical Institute Fellow of the Life Science Research Foundation. X.Z. was supported by a fellowship from the Jonsson Cancer Center Foundation. S.E.J. is an investigator of the Howard Hughes Medical Institute. References 1. Henderson IR, Jacobsen SE. Epigenetic Inheritance in Plants. Nature. 2007;447:418–424. [PubMed] 2. Goll MG, Bestor TH. Eukaryotic cytosine methyltransferases. Annu Rev Biochem. 2005;74:481–514. [PubMed] 3. Zhang X, et al. Genome-wide High-Resolution Mapping and Functional Analysis of DNA Methylation in Arabidopsis. Cell. 2006;126:1189–201. [PubMed] 4. Zilberman D, Gehring M, Tran RK, Ballinger T, Henikoff S. Genome-wide analysis of Arabidopsis thaliana DNA methylation uncovers an interdependence between methylation and transcription. Nat Genet. 2007;39:61–9. [PubMed] 5. Vaughn MW, et al. Epigenetic Natural Variation in Arabidopsis thaliana. PLoS Biol. 2007;5:e174. [PubMed] 6. Bentley DR. Whole-genome re-sequencing. Curr Opin Genet Dev. 2006;16:545–52. [PubMed] 7. Frommer M, et al. A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc Natl Acad Sci U S A. 1992;89:1827–31. [PubMed] 8. Ngernprasirtsiri J, Kobayashi H, Akazawa T. DNA methylation as a mechanism of transcriptional regulation in nonphotosynthetic plastids in plant cells. Proc Natl Acad Sci U S A. 1988;85:4750–4. [PubMed] 9. Tran RK, et al. DNA methylation profiling identifies CG methylation clusters in Arabidopsis genes. Curr Biol. 2005;15:154–9. [PubMed] 10. Gruenbaum Y, Naveh-Many T, Cedar H, Razin A. Sequence specificity of methylation in higher plant DNA. Nature. 1981;292:860–2. [PubMed] 11. Meyer P, Niedenhof I, ten Lohuis M. Evidence for cytosine methylation of non-symmetrical sequences in transgenic Petunia hybrida. Embo J. 1994;13:2084–8. [PubMed] 12. Cao X, Jacobsen SE. Locus-specific control of asymmetric and CpNpG methylation by the DRM and CMT3 methyltransferase genes. Proc Natl Acad Sci U S A. 2002;99(Suppl 4):16491–8. [PubMed] 13. Dieguez MJ, Vaucheret H, Paszkowski J, Mittelsten Scheid O. Cytosine methylation at CG and CNG sites is not a prerequisite for the initiation of transcriptional gene silencing in plants, but it is required for its maintenance. Mol Gen Genet. 1998;259:207–15. [PubMed] 14. Ramsahoye BH, et al. Non-CpG methylation is prevalent in embryonic stem cells and may be mediated by DNA methyltransferase 3a. Proc Natl Acad Sci U S A. 2000;97:5237–42. [PubMed] 15. Jia D, Jurkowska RZ, Zhang X, Jeltsch A, Cheng X. Structure of Dnmt3a bound to Dnmt3L suggests a model for de novo DNA methylation. Nature. 2007 16. Cao X, et al. Conserved plant genes with similarity to mammalian de novo DNA methyltransferases. Proc Natl Acad Sci U S A. 2000;97:4979–84. [PubMed] 17. Ideker T, et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science. 2001;292:929–34. [PubMed] 18. Vershinin AV, Heslop-Harrison JS. Comparative analysis of the nucleosomal structure of rye, wheat and their relatives. Plant Mol Biol. 1998;36:149–61. [PubMed] 19. Fulnecek J, Matyasek R, Kovarik A, Bezdek M. Mapping of 5-methylcytosine residues in Nicotiana tabacum 5S rRNA genes by genomic sequencing. Mol Gen Genet. 1998;259:133–41. [PubMed] 20. Fan Y, et al. Histone H1 depletion in mammals alters global chromatin structure but causes specific changes in gene regulation. Cell. 2005;123:1199–212. [PubMed] 21. Zhang X, Jacobsen SE. Genetic analyses of DNA methyltransferases in Arabidopsis thaliana. Cold Spring Harb Symp Quant Biol. 2006;71:439–47. [PubMed] 22. Henderson IR, et al. Dissecting Arabidopsis thaliana DICER function in small RNA processing, gene silencing and DNA methylation patterning. Nature Genetics. 2006 in press. 23. Bostick M, et al. UHRF1 Plays a Role in Maintaining DNA Methylation in Mammalian Cells. Science. 2007 24. Sharif J, et al. The SRA protein Np95 mediates epigenetic inheritance by recruiting Dnmt1 to methylated DNA. Nature. 2007 25. Meissner A, et al. Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic Acids Res. 2005;33:5868–77. [PubMed] 26. Rajagopalan R, Vaucheret H, Trejo J, Bartel DP. A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev. 2006;20:3407–25. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||
Nature. 2007 May 24; 447(7143):418-24.
[Nature. 2007]Annu Rev Biochem. 2005; 74():481-514.
[Annu Rev Biochem. 2005]Cell. 2006 Sep 22; 126(6):1189-201.
[Cell. 2006]PLoS Biol. 2007 Jul; 5(7):e174.
[PLoS Biol. 2007]Curr Opin Genet Dev. 2006 Dec; 16(6):545-52.
[Curr Opin Genet Dev. 2006]Proc Natl Acad Sci U S A. 1992 Mar 1; 89(5):1827-31.
[Proc Natl Acad Sci U S A. 1992]Nature. 2007 May 24; 447(7143):418-24.
[Nature. 2007]Cell. 2006 Sep 22; 126(6):1189-201.
[Cell. 2006]Proc Natl Acad Sci U S A. 1988 Jul; 85(13):4750-4.
[Proc Natl Acad Sci U S A. 1988]Cell. 2006 Sep 22; 126(6):1189-201.
[Cell. 2006]Nat Genet. 2007 Jan; 39(1):61-9.
[Nat Genet. 2007]Curr Biol. 2005 Jan 26; 15(2):154-9.
[Curr Biol. 2005]Nature. 2007 May 24; 447(7143):418-24.
[Nature. 2007]Cell. 2006 Sep 22; 126(6):1189-201.
[Cell. 2006]PLoS Biol. 2007 Jul; 5(7):e174.
[PLoS Biol. 2007]Nat Genet. 2007 Jan; 39(1):61-9.
[Nat Genet. 2007]Cell. 2006 Sep 22; 126(6):1189-201.
[Cell. 2006]Nature. 1981 Aug 27; 292(5826):860-2.
[Nature. 1981]EMBO J. 1994 May 1; 13(9):2084-8.
[EMBO J. 1994]Mol Gen Genet. 1998 Aug; 259(2):207-15.
[Mol Gen Genet. 1998]Proc Natl Acad Sci U S A. 2000 May 9; 97(10):5237-42.
[Proc Natl Acad Sci U S A. 2000]Proc Natl Acad Sci U S A. 2000 Apr 25; 97(9):4979-84.
[Proc Natl Acad Sci U S A. 2000]Science. 2001 May 4; 292(5518):929-34.
[Science. 2001]Mol Gen Genet. 1998 Aug; 259(2):133-41.
[Mol Gen Genet. 1998]Cell. 2005 Dec 29; 123(7):1199-212.
[Cell. 2005]Nature. 2007 May 24; 447(7143):418-24.
[Nature. 2007]Proc Natl Acad Sci U S A. 2002 Dec 10; 99 Suppl 4():16491-8.
[Proc Natl Acad Sci U S A. 2002]Cold Spring Harb Symp Quant Biol. 2006; 71():439-47.
[Cold Spring Harb Symp Quant Biol. 2006]Nucleic Acids Res. 2005; 33(18):5868-77.
[Nucleic Acids Res. 2005]Genes Dev. 2006 Dec 15; 20(24):3407-25.
[Genes Dev. 2006]Cell. 2006 Sep 22; 126(6):1189-201.
[Cell. 2006]Cell. 2006 Sep 22; 126(6):1189-201.
[Cell. 2006]