Format

Send to

Choose Destination
BMC Bioinformatics. 2015 Jul 10;16:217. doi: 10.1186/s12859-015-0641-x.

An evaluation of statistical methods for DNA methylation microarray data analysis.

Author information

1
Clinical and Translational Science Institute, School of Medicine and Dentistry, University of Rochester, 265 Crittenden Boulevard CU 420708, Rochester, 14642, NY, USA. dongmei_li@urmc.rochester.edu.
2
Department of Biostatistics and Computational Biology, University of Rochester, 265 Crittenden Boulevard CU 420708, Rochester, 14642, NY, USA. xzd482@yahoo.com.
3
John A. Burns School of Medicine, University of Hawaii, 651 Ilalo Street 101, Honolulu, 96813, HI, USA. lepape@hawaii.edu.
4
Department of Obstetrics and Gynecology, University of Rochester, 500 Red Creek Drive Suite 220, Rochester, 14623, NY, USA. Tim_Dye@URMC.Rochester.edu.

Abstract

BACKGROUND:

DNA methylation offers an excellent example for elucidating how epigenetic information affects gene expression. β values and M values are commonly used to quantify DNA methylation. Statistical methods applicable to DNA methylation data analysis span a number of approaches such as Wilcoxon rank sum test, t-test, Kolmogorov-Smirnov test, permutation test, empirical Bayes method, and bump hunting method. Nonetheless, selection of an optimal statistical method can be challenging when different methods generate inconsistent results from the same data set.

RESULTS:

We compared six statistical approaches relevant to DNA methylation microarray analysis in terms of false discovery rate control, statistical power, and stability through simulation studies and real data examples. Observable differences were noticed between β values and M values only when methylation levels were correlated across CpG loci. For small sample size (n=3 or 6 in each group), both the empirical Bayes and bump hunting methods showed appropriate FDR control and the highest power when methylation levels across CpG loci were independent. Only the bump hunting method showed appropriate FDR control and the highest power when methylation levels across CpG sites were correlated. For medium (n=12 in each group) and large sample sizes (n=24 in each group), all methods compared had similar power, except for the permutation test whenever the proportion of differentially methylated loci was low. For all sample sizes, the bump hunting method had the lowest stability in terms of standard deviation of total discoveries whenever the proportion of differentially methylated loci was large. The apparent test power comparisons based on raw p-values from DNA methylation studies on ovarian cancer and rheumatoid arthritis provided results as consistent as those obtained in the simulation studies. Overall, these results provide guidance for optimal statistical methods selection under different scenarios.

CONCLUSIONS:

For DNA methylation studies with small sample size, the bump hunting method and the empirical Bayes method are recommended when DNA methylation levels across CpG loci are independent, while only the bump hunting method is recommended when DNA methylation levels are correlated across CpG loci. All methods are acceptable for medium or large sample sizes.

PMID:
26156501
PMCID:
PMC4497424
DOI:
10.1186/s12859-015-0641-x
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center