Format

Send to

Choose Destination
Epigenetics Chromatin. 2016 Dec 7;9:56. doi: 10.1186/s13072-016-0107-z. eCollection 2016.

"Gap hunting" to characterize clustered probe signals in Illumina methylation array data.

Author information

1
Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe Street, Baltimore, MD 21205 USA.
2
Wendy Klag Center for Autism and Developmental Disabilities, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe Street, Baltimore, MD 21205 USA.
3
Center for Epigenetics, Johns Hopkins School of Medicine, 855 N. Wolfe Street, Baltimore, MD 21205 USA.
4
Department of Medicine, Johns Hopkins School of Medicine, 855 N. Wolfe Street, Baltimore, MD 21205 USA.
5
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe Street, Baltimore, MD 21205 USA.
6
McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, 1800 Orleans Street, Baltimore, MD 21287 USA.
7
Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, 624 N. Broadway, HH850, Baltimore, MD 21205 USA.
#
Contributed equally

Abstract

BACKGROUND:

The Illumina 450k array has been widely used in epigenetic association studies. Current quality-control (QC) pipelines typically remove certain sets of probes, such as those containing a SNP or with multiple mapping locations. An additional set of potentially problematic probes are those with DNA methylation distributions characterized by two or more distinct clusters separated by gaps. Data-driven identification of such probes may offer additional insights for downstream analyses.

RESULTS:

We developed a procedure, termed "gap hunting," to identify probes showing clustered distributions. Among 590 peripheral blood samples from the Study to Explore Early Development, we identified 11,007 "gap probes." The vast majority (9199) are likely attributed to an underlying SNP(s) or other variant in the probe, although SNP-affected probes exist that do not produce a gap signals. Specific factors predict which SNPs lead to gap signals, including type of nucleotide change, probe type, DNA strand, and overall methylation state. These expected effects are demonstrated in paired genotype and 450k data on the same samples. Gap probes can also serve as a surrogate for the local genetic sequence on a haplotype scale and can be used to adjust for population stratification.

CONCLUSIONS:

The characteristics of gap probes reflect potentially informative biology. QC pipelines may benefit from an efficient data-driven approach that "flags" gap probes, rather than filtering such probes, followed by careful interpretation of downstream association analyses. Our results should translate directly to the recently released Illumina EPIC array given the similar chemistry and content design.

KEYWORDS:

450k Array; Epigenome-wide association studies; Gap hunting; Illumina HumanMethylation450 BeadChip; Polymorphic CpG; SNP

PMID:
27980682
PMCID:
PMC5142147
DOI:
10.1186/s13072-016-0107-z
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center