Whole-genome Sequencing of SARS-CoV-2: Using Phylogeny and Structural Modeling to Contextualize Local Viral Evolution

Mil Med. 2022 Jan 4;187(1-2):e130-e137. doi: 10.1093/milmed/usab031.

Abstract

Introduction: The outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has created a global pandemic resulting in over 1 million deaths worldwide. In the Department of Defense (DoD), over 129,000 personnel (civilians, dependents, and active duty) have been infected with the virus to date. Rapid estimations of transmission and mutational patterns of virus outbreaks can be accomplished using whole-genome viral sequencing. Deriving interpretable and actionable results from pathogen sequence data is accomplished by the construction of phylogenetic trees (from local and global virus sequences) and by the creation of protein maps, to visualize and predict the effects of structural protein amino acid mutations.

Materials and methods: We developed a sequencing and bioinformatics workflow for molecular epidemiological SARS-CoV-2 surveillance using excess clinical specimens collected under an institutional review board exempt protocol at Joint Base San Antonio, Lackland AFB. This workflow includes viral RNA isolation, viral load quantification, tiling-based next-generation sequencing, sequencing and bioinformatics analysis, and data visualization via phylogenetic trees and protein mapping.

Results: Sequencing of 37 clinical specimens collected at JBSA/Lackland revealed that by June 2020, SAR-CoV-2 strains carrying the 614G mutation were the predominant cause of local coronavirus disease 2019 infections. We identified 109 nucleotide changes in the coding region of the SARS-CoV-2 genome (which lead to 63 unique, non-synonymous amino acid mutations), one mutation in the 5'-untranslated region (UTR), and two mutations in the 3'UTR. Furthermore, we identified and mapped six additional spike protein amino acid changes-information which could potentially aid vaccine design.

Conclusion: The workflow presented here is designed to enable DoD public health officials to track viral evolution and conduct near real-time evaluation of future outbreaks. The generation of molecular epidemiological sequence data is critical for the development of disease intervention strategies-most notably, vaccine design. Overall, we present a streamlined sequencing and bioinformatics methodology aimed at improving long-term readiness efforts in the DoD.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • COVID-19*
  • Genome, Viral
  • Humans
  • Phylogeny
  • SARS-CoV-2*
  • Spike Glycoprotein, Coronavirus / genetics
  • United States

Substances

  • Spike Glycoprotein, Coronavirus