Non-standard bioinformatics characterization of SARS-CoV-2

Comput Biol Med. 2021 Apr:131:104247. doi: 10.1016/j.compbiomed.2021.104247. Epub 2021 Feb 1.

Abstract

A non-standard bioinformatics method, 4D-Dynamic Representation of DNA/RNA Sequences, aiming at an analysis of the information available in nucleotide databases, has been formulated. The sequences are represented by sets of "material points" in a 4D space - 4D-dynamic graphs. The graphs representing the sequences are treated as "rigid bodies" and characterized by values analogous to the ones used in the classical dynamics. As the graphical representations of the sequences, the projections of the graphs into 2D and 3D spaces are used. The method has been applied to an analysis of the complete genome sequences of the 2019 novel coronavirus. As a result, 2D and 3D classification maps are obtained. The coordinate axes in the maps correspond to the values derived from the exact formulas characterizing the graphs: the coordinates of the centers of mass and the 4D moments of inertia. The points in the maps represent sequences and their coordinates are used as the classifiers. The main result of this work has been derived from the 3D classification maps. The distribution of clusters of points which emerged in these maps, supports the hypothesis that SARS-CoV-2 may have originated in bat and in pangolin. Pilot calculations for Zika virus sequence data prove that the proposed approach is also applicable to a description of time evolution of genome sequences of viruses.

Keywords: Alignment-free methods; Moments of inertia; Similarity/dissimilarity analysis of DNA/RNA sequences.

MeSH terms

  • Algorithms*
  • Animals
  • Base Sequence*
  • COVID-19 / genetics*
  • Chiroptera / virology
  • Computational Biology*
  • Genome, Viral*
  • Humans
  • Pangolins / virology
  • Phylogeny
  • SARS-CoV-2 / genetics*
  • Zika Virus / genetics
  • Zika Virus Infection / genetics