DNA-Aeon provides flexible arithmetic coding for constraint adherence and error correction in DNA storage

Nat Commun. 2023 Feb 6;14(1):628. doi: 10.1038/s41467-023-36297-3.

Abstract

The extensive information capacity of DNA, coupled with decreasing costs for DNA synthesis and sequencing, makes DNA an attractive alternative to traditional data storage. The processes of writing, storing, and reading DNA exhibit specific error profiles and constraints DNA sequences have to adhere to. We present DNA-Aeon, a concatenated coding scheme for DNA data storage. It supports the generation of variable-sized encoded sequences with a user-defined Guanine-Cytosine (GC) content, homopolymer length limitation, and the avoidance of undesired motifs. It further enables users to provide custom codebooks adhering to further constraints. DNA-Aeon can correct substitution errors, insertions, deletions, and the loss of whole DNA strands. Comparisons with other codes show better error-correction capabilities of DNA-Aeon at similar redundancy levels with decreased DNA synthesis costs. In-vitro tests indicate high reliability of DNA-Aeon even in the case of skewed sequencing read distributions and high read-dropout.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • DNA Replication*
  • DNA* / genetics
  • Reproducibility of Results
  • Sequence Analysis, DNA

Substances

  • DNA