Format

Send to

Choose Destination
Bioinformatics. 2017 Jun 15;33(12):1765-1772. doi: 10.1093/bioinformatics/btx064.

pETM: a penalized Exponential Tilt Model for analysis of correlated high-dimensional DNA methylation data.

Sun H1, Wang Y2, Chen Y3, Li Y4,5,6, Wang S2.

Author information

1
Department of Statistics, Pusan National University, Busan, Korea.
2
Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA.
3
Division of Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
4
Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA.
5
Department of Genetics, University of North Carolina, Chapel Hill, NC, USA.
6
Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA.

Abstract

Motivation:

DNA methylation plays an important role in many biological processes and cancer progression. Recent studies have found that there are also differences in methylation variations in different groups other than differences in methylation means. Several methods have been developed that consider both mean and variance signals in order to improve statistical power of detecting differentially methylated loci. Moreover, as methylation levels of neighboring CpG sites are known to be strongly correlated, methods that incorporate correlations have also been developed. We previously developed a network-based penalized logistic regression for correlated methylation data, but only focusing on mean signals. We have also developed a generalized exponential tilt model that captures both mean and variance signals but only examining one CpG site at a time.

Results:

In this article, we proposed a penalized Exponential Tilt Model (pETM) using network-based regularization that captures both mean and variance signals in DNA methylation data and takes into account the correlations among nearby CpG sites. By combining the strength of the two models we previously developed, we demonstrated the superior power and better performance of the pETM method through simulations and the applications to the 450K DNA methylation array data of the four breast invasive carcinoma cancer subtypes from The Cancer Genome Atlas (TCGA) project. The developed pETM method identifies many cancer-related methylation loci that were missed by our previously developed method that considers correlations among nearby methylation loci but not variance signals.

Availability and Implementation:

The R package 'pETM' is publicly available through CRAN: http://cran.r-project.org .

Contact:

sw2206@columbia.edu.

Supplementary information:

Supplementary data are available at Bioinformatics online.

PMID:
28165116
PMCID:
PMC5860278
DOI:
10.1093/bioinformatics/btx064
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center