Format

Send to

Choose Destination
BMC Bioinformatics. 2019 Oct 29;20(1):533. doi: 10.1186/s12859-019-3100-2.

RACS: rapid analysis of ChIP-Seq data for contig based genomes.

Author information

1
Department of Chemistry and Biology, Ryerson University, 350 Victoria St, Toronto, M5B 2K3, Canada.
2
SciNet High Performance Computing Consortium, University of Toronto, 661 University Ave, Toronto, M5G 1M1, Canada.
3
Department of Molecular Genetics, University of Toronto, 1 King's College Cir, Toronto, M5S 1A8, Canada.
4
Department of Chemistry and Biology, Ryerson University, 350 Victoria St, Toronto, M5B 2K3, Canada. jeffrey.fillingham@ryerson.ca.

Abstract

BACKGROUND:

Chromatin immunoprecipitation coupled to next generation sequencing (ChIP-Seq) is a widely-used molecular method to investigate the function of chromatin-related proteins by identifying their associated DNA sequences on a genomic scale. ChIP-Seq generates large quantities of data that is difficult to process and analyze, particularly for organisms with a contig-based sequenced genomes that typically have minimal annotation on their associated set of genes other than their associated coordinates primarily predicted by gene finding programs. Poorly annotated genome sequence makes comprehensive analysis of ChIP-Seq data difficult and as such standardized analysis pipelines are lacking.

RESULTS:

We present a one-stop computational pipeline, "Rapid Analysis of ChIP-Seq data" (RACS), that utilizes traditional High-Performance Computing (HPC) techniques in association with open source tools for processing and analyzing raw ChIP-Seq data. RACS is an open source computational pipeline available from any of the following repositories https://bitbucket.org/mjponce/RACS or https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACS . RACS is particularly useful for ChIP-Seq in organisms with contig-based genomes that have poor gene annotation to aid protein function discovery.To test the performance and efficiency of RACS, we analyzed ChIP-Seq data previously published in a model organism Tetrahymena thermophila which has a contig-based genome. We assessed the generality of RACS by analyzing a previously published data set generated using the model organism Oxytricha trifallax, whose genome sequence is also contig-based with poor annotation.

CONCLUSIONS:

The RACS computational pipeline presented in this report is an efficient and reliable tool to analyze genome-wide raw ChIP-Seq data generated in model organisms with poorly annotated contig-based genome sequence. Because RACS segregates the found read accumulations between genic and intergenic regions, it is particularly efficient for rapid downstream analyses of proteins involved in gene expression.

KEYWORDS:

Bioinformatics pipeline; Chromatin immunoprecipitation; High-performance computing; Next generation sequencing; Tetrahymena thermophila

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center