Display Settings:

Format

Send to:

Choose Destination
    IEEE/ACM Trans Comput Biol Bioinform. 2007 Jul-Sep;4(3):447-57.

    Compression of annotated nucleotide sequences.

    Source

    Institute of Signal Processing, Tampere University of Technology, Tampere, Finland. gergely.korodi@tut.fi

    Abstract

    This article introduces an algorithm for the lossless compression of DNA files, which contain annotation text besides the nucleotide sequence. First a grammar is specifically designed to capture the regularities of the annotation text. A revertible transformation uses the grammar rules in order to equivalently represent the original file as a collection of parsed segments and a sequence of decisions made by the grammar parser. This decomposition enables the efficient use of state-of-the-art encoders for processing the parsed segments. The output size of the decision-making process of the grammar is optimized by extending the states to account for high-order Markovian dependencies. The practical implementation of the algorithm achieves a significant improvement when compared to the general-purpose methods currently used for DNA files.

    PMID:
    17666764
    [PubMed - indexed for MEDLINE]

      Supplemental Content

      Icon for IEEE Computer Society

      Save items

      loading

      Recent activity

      Your browsing activity is empty.

      Activity recording is turned off.

      Turn recording back on

      See more...
      Write to the Help Desk