Whole genome-bisulfite sequencing has revealed that methylomes differ dramatically in a cell type-specific manner. However, the global mechanisms of maintenance of DNA methylation patterns and their functional consequences are not clear. Using a combination of novel haplotype-resolved methylomes and publicly available data, we show that three distinct classes of methylomes that differ by the proportion of partially and highly methylated domains are characteristic of stem and progenitor, somatic and transformed cells. We provide evidence that three mechanisms can account for most of the differences between these three classes of methylomes. The first, protection of promoters and cis-acting regulatory elements from DNA methylation by DNA-binding proteins, keeps regulatory elements unmethylated. The second, DNA replication-associated maintenance of DNA methylation, is highly efficient in stem cells, moderately efficient in somatic cells, and very inefficient in transformed cells, and sets variable basal levels of methylation characteristic of the partially methylated domains of each cell type. The third, transcriptional elongation-associated gene-body DNA methylation is responsible for the formation of highly methylated domains in somatic and transformed cells. We show that partitioning of the genome into partially and highly methylated domains is associated with different mechanisms of gene silencing.
Overall design: A transcriptome analysis was performed on self-generated data from basophilic erythroblasts (“BasoE”, as described in detail below, with raw data files included), as well as from previously published cell lines and primary tissues (only processed data included here): IMR90, H1 hESC, HepG2, K562, hematopoietic stem/progenitor cells (HSPC), and pancreas. ENCODE raw sequencing data was retrieved from http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/ for replicates of paired-end data, i.e. for IMR90 cells: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/wgEncodeCshlLongRnaSeqImr90CellPapFastqRd1Rep1.fastq.gz, http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/wgEncodeCshlLongRnaSeqImr90CellPapFastqRd1Rep2.fastq.gz, http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/wgEncodeCshlLongRnaSeqImr90CellPapFastqRd2Rep1.fastq.gz, http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/wgEncodeCshlLongRnaSeqImr90CellPapFastqRd2Rep2.fastq.gz, for H1 hESC: wgEncodeCshlLongRnaSeqH1hescCellPapFastqRd1Rep1.fastq.gz, wgEncodeCshlLongRnaSeqH1hescCellPapFastqRd1Rep2.fastq.gz, wgEncodeCshlLongRnaSeqH1hescCellPapFastqRd2Rep1.fastq.gz, wgEncodeCshlLongRnaSeqH1hescCellPapFastqRd2Rep2.fastq.gz, for HepG2: wgEncodeCshlLongRnaSeqHepg2CellPapFastqRd1Rep1.fastq.gz, wgEncodeCshlLongRnaSeqHepg2CellPapFastqRd1Rep2.fastq.gz, wgEncodeCshlLongRnaSeqHepg2CellPapFastqRd2Rep1.fastq.gz, wgEncodeCshlLongRnaSeqHepg2CellPapFastqRd2Rep2.fastq.gz, for K562: wgEncodeCshlLongRnaSeqK562CellPapFastqRd1Rep1.fastq.gz, wgEncodeCshlLongRnaSeqK562CellPapFastqRd1Rep2.fastq.gz, wgEncodeCshlLongRnaSeqK562CellPapFastqRd2Rep1.fastq.gz, wgEncodeCshlLongRnaSeqK562CellPapFastqRd2Rep2.fastq.gz. RNA-seq from HSPC was downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM909310 (SRX135562), and RNA-seq from pancreas was downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1120309 (SRX263865).
Less...