CODOC: efficient access, analysis and compression of depth of coverage signals

Bioinformatics. 2014 Sep 15;30(18):2676-7. doi: 10.1093/bioinformatics/btu362. Epub 2014 May 28.

Abstract

Current data formats for the representation of depth of coverage data (DOC), a central resource for interpreting, filtering or detecting novel features in high-throughput sequencing datasets, were primarily designed for visualization purposes. This limits their applicability in stand-alone analyses of these data, mainly owing to inaccurate representation or mediocre data compression. CODOC is a novel data format and comprehensive application programming interface for efficient representation, access and analysis of DOC data. CODOC compresses these data ∼ 4-32× better than the best current comparable method by exploiting specific data characteristics while at the same time enabling more-exact signal recovery for lossy compression and very fast query answering times.

Availability and implementation: Java source code and binaries are freely available for non-commercial use at http://purl.org/bgraph/codoc.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Data Compression / methods*
  • Data Mining
  • High-Throughput Nucleotide Sequencing*
  • Statistics as Topic / methods*
  • Time Factors
  • User-Computer Interface