Format

Send to

Choose Destination
Proc Data Compress Conf. 2013;2013:371-380. Epub 2013 Mar 22.

An Adaptive Difference Distribution-based Coding with Hierarchical Tree Structure for DNA Sequence Compression.

Author information

1
Department of Electronic Engineering Shanghai Jiaotong University Shanghai 200240, China, daiwenrui@sjtu.edu.cn.
2
Department of Electronic Engineering Shanghai Jiaotong University Shanghai 200240, China, xionghongkai@sjtu.edu.cn.
3
Division of Biomedical Informatics University of California, San Diego San Diego, CA 92093, USA, x1jiang@ucsd.edu.
4
Division of Biomedical Informatics University of California, San Diego San Diego, CA 92093, USA, lohnomachado@ucsd.edu.

Abstract

Previous reference-based compression on DNA sequences do not fully exploit the intrinsic statistics by merely concerning the approximate matches. In this paper, an adaptive difference distribution-based coding framework is proposed by the fragments of nucleotides with a hierarchical tree structure. To keep the distribution of difference sequence from the reference and target sequences concentrated, the sub-fragment size and matching offset for predicting are flexible to the stepped size structure. The matching with approximate repeats in reference will be imposed with the Hamming-like weighted distance measure function in a local region closed to the current fragment, such that the accuracy of matching and the overhead of describing matching offset can be balanced. A well-designed coding scheme will make compact both the difference sequence and the additional parameters, e.g. sub-fragment size and matching offset. Experimental results show that the proposed scheme achieves 150% compression improvement in comparison with the best reference-based compressor GReEn.

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center