Format

Send to

Choose Destination
Genome Res. 2015 Sep;25(9):1391-400. doi: 10.1101/gr.189894.115. Epub 2015 Jul 10.

Saturation analysis of ChIP-seq data for reproducible identification of binding peaks.

Author information

1
Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany; Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany;
2
Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany; Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany; Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany;
3
Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany; Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany;
4
Department of Biostatistics, Clinical Research Unit, Berlin Institute of Health, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany;
5
Labor für Pädiatrische Molekularbiologie, Charité-Universitätsmedizin Berlin, 10117, Berlin, Germany;
6
Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany; Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany; Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany; Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany.

Abstract

Chromatin immunoprecipitation coupled with next-generation sequencing (ChIP-seq) is a powerful technology to identify the genome-wide locations of transcription factors and other DNA binding proteins. Computational ChIP-seq peak calling infers the location of protein-DNA interactions based on various measures of enrichment of sequence reads. In this work, we introduce an algorithm, Q, that uses an assessment of the quadratic enrichment of reads to center candidate peaks followed by statistical analysis of saturation of candidate peaks by 5' ends of reads. We show that our method not only is substantially faster than several competing methods but also demonstrates statistically significant advantages with respect to reproducibility of results and in its ability to identify peaks with reproducible binding site motifs. We show that Q has superior performance in the delineation of double RNAPII and H3K4me3 peaks surrounding transcription start sites related to a better ability to resolve individual peaks. The method is implemented in C++ and is freely available under an open source license.

PMID:
26163319
PMCID:
PMC4561497
DOI:
10.1101/gr.189894.115
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center