Format

Send to

Choose Destination
Bioinformatics. 2016 Feb 1;32(3):469-71. doi: 10.1093/bioinformatics/btv577. Epub 2015 Oct 7.

CpGFilter: model-based CpG probe filtering with replicates for epigenome-wide association studies.

Author information

1
Division of Biomedical Statistics and Informatics and Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115.
2
Department of Environmental Health, Harvard School of Public Health, Boston, MA 02115.
3
Department of Preventive Medicine and the Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, Chicago, IL 60208 and.
4
Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60208, USA.
5
Division of Biomedical Statistics and Informatics and Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905.
6
Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115.

Abstract

SUMMARY:

The development of the Infinium HumanMethylation450 BeadChip enables epigenome-wide association studies at a reduced cost. One observation of the 450K data is that many CpG sites the beadchip interrogates have very large measurement errors. Including these noisy CpGs will decrease the statistical power of detecting relevant associations due to multiple testing correction. We propose to use intra-class correlation coefficient (ICC), which characterizes the relative contribution of the biological variability to the total variability, to filter CpGs when technical replicates are available. We estimate the ICC based on a linear mixed effects model by pooling all the samples instead of using the technical replicates only. An ultra-fast algorithm has been developed to address the computational complexity and CpG filtering can be completed in minutes on a desktop computer for a 450K data set of over 1000 samples. Our method is very flexible and can accommodate any replicate design. Simulations and a real data application demonstrate that our whole-sample ICC method performs better than replicate-sample ICC or variance-based method.

AVAILABILITY AND IMPLEMENTATION:

CpGFilter is implemented in R and publicly available under CRAN via the R package 'CpGFilter'.

CONTACT:

chen.jun2@mayo.edu or xlin@hsph.harvard.edu

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

PMID:
26449931
PMCID:
PMC4757944
DOI:
10.1093/bioinformatics/btv577
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center