FLR: A Revolutionary Alignment-Free Similarity Analysis Methodology for DNA-Sequences

IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1924-1936. doi: 10.1109/TCBB.2020.2967385. Epub 2021 Oct 7.

Abstract

This paper introduces a novel alignment-free sequence analysis methodology. Its main idea is based on introducing a new representation of the DNA-Sequence. This representation breaks the dependency between the DNA bases that exist in the traditional string presentation. We called it the Four-Lists-Representation (FLR). Based on the FLR, a series of revolutionary algorithms for searching, map-discovery, similarity-score analysis, and similarity-visualization have been developed. They are combined in what we call the FLR Methodology. The paper also studies most of the available similarity analysis techniques in a comprehensive state-of-art review. The conducted extensive simulation and theoretical studies confirm the outperformance of the whole set of FLR-based algorithms in terms of speed and memory consumption in comparison to a long list of available similarity analysis algorithms. The ability to provide a similarity-map, similarity-score, and similarity-graph as a set of evidence-based rationales makes the quality of results provided by the proposed methodology presents a new edge in this field and promises a new area of genome-based research.

MeSH terms

  • Algorithms*
  • Animals
  • Computational Biology
  • DNA* / chemistry
  • DNA* / genetics
  • Humans
  • Sequence Analysis, DNA / methods*

Substances

  • DNA