Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

J Virol. 2017 Mar 29;91(8):e02275-16. doi: 10.1128/JVI.02275-16. Print 2017 Apr 15.

Abstract

Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds.IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids.

Keywords: cotranslational protein folding; more sensitive orphan gene annotation; sequence similarity twilight zone; structure-based viral lineages.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Capsid Proteins / chemistry*
  • Capsid Proteins / genetics*
  • Cluster Analysis
  • Protein Conformation
  • Sequence Homology, Amino Acid*
  • Sequence Homology, Nucleic Acid*
  • Viruses / classification*
  • Viruses / genetics
  • Viruses / ultrastructure

Substances

  • Capsid Proteins