Genomic Characterization of a Novel SARS-CoV-2 Lineage from Rio de Janeiro, Brazil

Almost simultaneously, several studies reported the emergence of novel SARS-CoV-2 lineages characterized by their phylogenetic and genetic distinction (1), (2), (3), (4).….

We found a total of 731 single-nucleotide variants (SNVs) across the 180 samples, of which 50.3% were missense, 44.5% synonymous, 5.1% intergenic, and 0.1% nonsense. For the phylogenetic reconstruction, we gathered a global data set containing the 180 new genomes plus 1,197 high-coverage genomes from GISAID (GISAID Initiative; see Table S2), including 116 Brazilian and 1,081 worldwide genomes whose collection dates ranged from May to November (data downloaded 3 December 2020). Genomes were classified using Pangolin (COG-UK).
The phylogenetic reconstruction indicates that the vast majority (86.8%) of Brazilian genomes fall within three clades (Fig. 1). Clade I comprises B.1.1.33 strains, while clades II and III are composed of B.1.1.28 strains. We inferred that P.2 falls within clade II ( Fig.  1) and emerged in July 2020.
An essential feature of the new strains recently described in the United Kingdom (B.1.1.7) (1), South Africa (B.1.351) (2), and Brazil (P.1) (3, 4) is that they have a unique set of multiple spike mutations, including N501Y, which is associated with greater infectivity and transmissibility. In contrast, all P.2 genomes exhibit only E484K in addition to a varying collection of either novel or rarely found spike mutations. E484K is also present in P. 1   We randomly sampled one genome/week/country from available samples in GISAID with collection dates ranging from May to November. Brazilian sequences with collection dates from May to November were all added to the data set (see Table S2). We ran IQTree 2.0.3 (10) under a general time-reversible (GTR) model of nucleotide substitution (11) with empirical base frequencies and invariant sites. Tips are colored according to their origin. Genomes generated from this work are red, other Brazilian genomes are blue, and genomes from other countries are not colored. Gray areas represent the three clades where Brazilian viruses are concentrated. The emergent lineage identified in this work, P.2, is in a dark-gray box and highlighted with red branches. The approximate likelihood ratio test (aLRT) support value for the branch holding P.2 monophyly is shown. (B) A time-scaled tree of clade II was estimated under a strict molecular clock in BEAST v.1.10.4 (12) using the GTR1I (11) nucleotide substitution model and assuming an exponential growth tree prior and a normal prior for the clock rate (mean = 8 Â 10 24 and standard deviation = 0.1 Â 10 25 ). The convergence of the Markov chain Monte Carlo chains, which were run at least for 50 million generations and sampled every 1,000th step, was inspected using Tracer v.1.7.1 (13). Maximum clade credibility (MCC) summary trees were generated using (Continued on next page) phylogenetic analysis of all their genomes available in GISAID (data downloaded 11 February 2021). Figure 2 highlights the differences between them and their origin, showing that despite sharing E484K, both emerged from independent events. The localities where these two lineages were first detected have had entirely different epidemiological dynamics since the beginning of coronavirus disease 2019 (COVID-19) history in Brazil (5); therefore, the two variants arose independently in different epidemiological contexts.
The cases of reinfection involving the new variant lineage P.2 (6, 7) and the ability of viruses harboring the mutation E484K to escape from neutralizing antibodies (8,9) emphasize the importance of monitoring the spread of this new strain. In about 4 months, 342 genomes classified as P.2 have become available. The emergent lineage P.2, initially detected in 37 samples from Rio de Janeiro State, is spread across all Brazilian regions and 16 other countries. Brazilian viruses are the oldest and constitute the majority (55%) of the P.2 genomes, suggesting its exportation worldwide.

FIG 1 Legend (Continued)
TreeAnnotator v.1.10.4 (12). Red circles in the tip nodes represent genomes generated in this work, and blue circles indicate other Brazilian samples. P.2 is identified by the red branches and highlighted in a gray box. The posterior probability of the branch holding P.2 monophyly is shown.