Send to

Choose Destination
See comment in PubMed Commons below
J Bioinform Comput Biol. 2013 Jun;11(3):1341003. doi: 10.1142/S0219720013410035.

Genome-wide analysis and modeling of DNA methylation susceptibility in 30 breast cancer cell lines by using CpG flanking sequences.

Author information

Department of Computer Science and Engineering, Seoul National University, Seoul, Korea.


DNA methylation is an epigenetic modification of DNA that adds a methyl group to cytosine. Aberrant DNA methylation in the CpG context is frequently observed in cancer cells and it is known that aberrant DNA methylation silences tumor repressor genes. However, the mechanism of DNA methylation is not well understood. A widely accepted hypothesis is that DNA methylation does not randomly occur and may be controlled by some instructive mechanisms. In this paper, we conducted an extensive study on this important question by using proprietary sequencing data from methyl-binding domain protein (MBD)-Cap ChIP sequencing experiments for 30 breast cancer cell lines. The goal of our study is to investigate difference in nucleotide composition around CpG sites, where high levels of methylation are observed, and use the information for modeling DNA methylation susceptibility. First, we observed that DNA methylation is not uniform in the whole-genome region and also that the character composition of CpG flanking sequences are significantly different between hyper- and hypo-methylated groups. In an in-depth study, we used information theoretic approaches such as entropy and relative entropy to delineate character composition features and found enrichment of A (Adenine) and T (Thymine) in specific positions around hyper-methylated sites. As the methylation level is increased, A, T proportions in specific positions around hypermethylated sites are increased while A, T proportions in other positions around hypermethylated sites are decreased. Second, we built predictive models for methylation susceptibility by using characters flanking CpG sites as features and hyper-/hypo-methylation status as class. Third, we constructed predictive models using a log odds score of two profiles from DNA sequences surrounding CpG sites of hyper- and hypo-methylated groups. This analysis showed that distribution of profile scores of hyper-/hypo-methylated sites sequences is quite distinct. Our genome-wide CpG methylation study shows that nucleotides around CpG sites caries information for cytosine methylation. This is consistent with the seminal work on the instructive evidence of DNA methylation by Keshet et al. (Nature Genetics, 38(2), 149-153 2006). Our study is on the full genome scale and used the sequencing data, thus our study is significantly different in terms of resolution of data and analysis methods used for the study by Keshet et al.

[Indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Atypon
    Loading ...
    Support Center