Current data formats for the representation of depth of coverage data (DOC), a central resource for interpreting, filtering or detecting novel features in high-throughput sequencing datasets, were primarily designed for visualization purposes. This limits their applicability in stand-alone analyses of these data, mainly owing to inaccurate representation or mediocre data compression. CODOC is a novel data format and comprehensive application programming interface for efficient representation, access and analysis of DOC data. CODOC compresses these data ∼ 4-32× better than the best current comparable method by exploiting specific data characteristics while at the same time enabling more-exact signal recovery for lossy compression and very fast query answering times.
Availability and implementation: Java source code and binaries are freely available for non-commercial use at http://purl.org/bgraph/codoc.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.