Epidemiological characteristics of the B.1.526 SARS-CoV-2 variant

To characterize the epidemiological properties of the B.1.526 SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) variant of interest, here we used nine epidemiological and population datasets and model-inference methods to reconstruct SARS-CoV-2 transmission dynamics in New York City, where B.1.526 emerged. We estimated that B.1.526 had a moderate increase (15 to 25%) in transmissibility, could escape immunity in 0 to 10% of previously infected individuals, and substantially increased the infection fatality risk (IFR) among adults 65 or older by >60% during November 2020 to April 2021, compared to estimates for preexisting variants. Overall, findings suggest that new variants like B.1.526 likely spread in the population weeks before detection and that partial immune escape (e.g., resistance to therapeutic antibodies) could offset prior medical advances and increase IFR. Early preparedness for and close monitoring of SARS-CoV-2 variants, their epidemiological characteristics, and disease severity are thus crucial to COVID-19 (coronavirus disease 2019) response.


INTRODUCTION
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus spread quickly worldwide in early 2020, causing the coronavirus disease 2019 (COVID- 19) pandemic. As the virus spread, it also diversified, and multiple novel SARS-CoV-2 variants emerged in different populations, producing both local and global waves of infection. Several variants have been characterized as variants of concern (VOCs) or of interest (VOIs), based on evidence regarding their ability to increase transmissibility, evade immunity conferred by either prior infection or vaccination, or cause more severe disease. Accurately estimating the epidemiological characteristics and impact of these variants is thus important for informing public health response, such as monitoring effectiveness of vaccines and therapeutic antibodies. More broadly, such findings can also provide insights into the long-term trajectory of SARS-CoV-2 beyond the pandemic phase.
The B.1.526 variant [World Health Organization (WHO) designation: Iota] (1), a SARS-CoV-2 VOI, was identified during November 2020 and quickly became a predominant variant in the New York City (NYC) area (2)(3)(4). It has also been detected in all 52 states/territories in the United States and at least 27 other countries [Global Initiative on Sharing All Influenza Data (5), as of 9 June 2021]. An initial laboratory study (2) suggested that this variant is to some extent resistant to two therapeutic monoclonal antibodies in clinical use and neutralization by convalescent plasma and vaccinee sera. However, another study (4) examined all sequenced B.1.526 cases in NYC identified as of 5 April 2021 (n = 3679) and showed preliminary evidence that this variant did not increase risk for infection after vaccination or reinfection. Both studies may be limited because of the small number of specimens available for analysis as well as delay in observation and reporting. Given these discrepancies, here, we use detailed population epidemiological data collected since the beginning of the COVID-19 pandemic in NYC (1 March 2020 to 30 April 2021) and multiple model-inference methods to estimate the transmissibility, immune escape potential, and disease severity of B.1.526. Of note, we include the combination of B.1.526-S:E484K and B.1.526-S:S477N as B.1.526.
As shown in Fig. 1 (overall study design), we first apply a network model-inference system to reconstruct underlying SARS-CoV-2 transmission dynamics in NYC, accounting for underdetection of infection. This analysis allows estimation of key population variables and parameters (e.g., the infection rate including those not detected as cases and transmission rate) at the neighborhood level as well as citywide. As such, we are able to examine in detail the epidemiological dynamics in Washington Heights-Inwood (WHts), the neighborhood where B.1.526 was initially identified (2). Using these estimates and additional variant prevalence data, we further apply a city-level multivariant, age-structured model to estimate the changes in transmissibility and immune escape potential for B.1.526. Last, we use findings from the first two model systems to estimate variant-specific infection fatality risk (IFR; i.e., the fraction of all persons with SARS-CoV-2 infection who died from the disease), for B.1.526 and B.1.1.7, separately. Overall, our study documents the emergence and impact of B.1.526. We close with a discussion on lessons learned from this SARS-CoV-2 VOI and implications for future COVID-19 pandemic response.
workers. However, infection resurged beginning in the fall of 2020 and the city experienced a second pandemic wave around November 2020 to April 2021 (Fig. 2). Following the second wave, an estimated total of 41.7% (95% CrI, 35.4 to 49.3%; or 3.0 to 4.1 million people) had been infected by the end of April 2021, including all those infected during the first wave. Note that these estimates accounted for underdetection of infections (fig. S1), for which the overall infectiondetection rate increased to 37.1% (95% CrI, 33.3 to 43.0%) during the second wave from 15.1% (95% CrI, 11.7 to 18.5%) during the first wave. This large number of infections occurred despite the nonpharmaceutical interventions implemented throughout the pandemic and rollout of mass vaccination starting mid-December 2020. In addition, unlike the first wave that predominantly affected older age groups, the second pandemic wave affected all age groups (figs. S2 and S3).

Transmission rate increased earliest in the neighborhood where B.1.526 was initially identified
The emergence and rapid increase of B.1.526 coincided with the second pandemic wave in NYC. While first reported in February 2021 (2), testing initially identified the B.1.526 variant in patient samples dated back to early November 2020 from the city's WHts neighborhood (2). As such, we first examined potential changes in the transmission rate there. Before the identification of B.1.526, estimated neighborhood relative transmission rate (b i in Eq. 1; see Materials and Methods) in WHts gradually increased, remained at high levels during November 2020 to February 2021, and decreased to the baseline level afterward when B.1.526 became a predominant variant citywide (~40% of all cases sequenced by end of February 2021). In comparison, the estimates were relatively stable for other neighborhoods (Fig. 3A), suggesting that the changes in WHts were likely due to the early spread of B.1.526. Averaging over this period, we estimate that the relative transmission rate in WHts increased by 8.4% [95% confidence interval (CI), −5.8 to 22.5%]. Concurrently, the citywide transmission rate increased by 13.3% (95% CI, −21.1 to 47.8%; Fig. 3B). These two preliminary estimates in combination suggest that the transmission rate of B.1.526 is likely 22.8% (95% CI, −12.4 to 58.0%) higher than preexisting non-VOC/VOI variants, without accounting for potential change due to immune evasion.

B.1.526 likely causes a moderate increase in transmissibility (15 to 25%) and slight immune evasion (0 to 10%)
We further examine model estimations under a wide range of transmissibility and immune escape settings for B.1.526. Under all three possible scenarios of initial prevalence (i.e., 0.5 to 2.5%, 1.5 to 3.5%, and 0.5 to 3.5%), model simulations consistently show that B.1.526 likely increases transmissibility by 15 to 30% and can escape immunity in 0 to 10% of previously infected persons (Fig. 4, A to C). Overall, a higher initial prevalence (1.5 to 3.5% at the beginning of November 2020) combined with a 15 to 25% increase in transmissibility and 0 to 10% immune escape (Fig. 4, A to C, middle column, and D) generated the most accurate estimates of cases, hospitalizations, and deaths, as well as variant percentages during the second wave. Model simulations show that, with this moderate increase in transmissibility and small immune escape, B.1.526 was able to outcompete preexisting variants and gradually increase its percentage from November 2020 to March 2021; however, afterward,

B.1.526 likely increases disease severity substantially
During the second wave, estimated IFR increased gradually in later months, particularly among older age groups (Fig. 5). During this period (November 2020 to April 2021), 16 to 35% of hospital beds and 18 to 32% of intensive care unit beds in NYC were available, suggesting that lack of access to health care was not a reason behind the IFR increases. In addition, the number of COVID-19-related deaths declined following mass vaccination in early 2021. Modeling accounting for infections and deaths due to B.  Table 1). Analysis of a more restricted period, November 2020 to January 2021, suggests similar IFR increases (table S3). These estimated IFR increases were lower than for B.1.1.7 but comparable. Of note, the IFRs for B.1.1.7 estimated here were higher than, but in line with, those reported in the United Kingdom [e.g., overall increase, 100% (75 to 130%) versus 61% (42 to 82%) in the United Kingdom (7)].

DISCUSSION
The B.1.526 variant is one of the SARS-CoV-2 variants designated as a VOI by both the WHO (1) and the U.S. Centers for Disease Control and Prevention (8). However, because of a lack of extensive genomic sequencing and contact tracing data, particularly during the early phase of its emergence, its key epidemiological properties have not been well characterized. Using multiple epidemiological datasets and comprehensive modeling, here we have estimated the changes Model estimates using the overall best-performing parameter combination (i.e., 1.5 to 3.5% initial prevalence, 15 to 25% higher transmissibility, and 0 to 10% immune escape). Lines and surrounding areas show model-simulated median estimates and interquartile range; dots show corresponding observations; colors indicate different variants as specified in the legend. Note that these model simulations used the same infection-detection rate, hospitalization rate, and IFR (i.e., average during November 2020 to April 2021); that is, they did not account for changes in case ascertainment or disease severity by week during this period, due to, e.g., increases in disease severity by the new variants. As such, there were larger deviations from the observations during later months of the simulation with more infections by the new variants. in transmissibility, immune escape potential, and disease severity for B.1.526. Results suggest that, compared to preexisting non-VOC/VOI variants, B.1.526 causes a moderate increase in transmissibility and minimal immune evasion; however, it might substantially increase IFR in older adults. As such, continued monitoring of the circulation of this variant is warranted.
Our study offers several lessons for future outbreak response. First, before the emergence of B.1.526, the estimated transmission rate in WHts, where it likely emerged, was consistently higher than other neighborhoods in NYC throughout the pandemic. Population characteristics (e.g., household structure) that may contribute to this higher transmission rate need further investigation; however, the higher transmission rate may have facilitated the spread of new mutants between hosts and its emergence population-wide. It is thus important to closely monitor populations with sustained higher transmission rates for new variants, particularly in areas lacking robust and timely sequencing of samples from newly identified cases. In addition, the estimated transmission rate in WHts further increased in conjunction with the emergence of B.1.526; such changes may thus serve as an early indicator for in-depth epidemiological investigation (e.g., to assess changes in circulating variants and transmissibility). A similar approach has been applied in the United Kingdom, where subregions with higher estimated growth rates were prospectively investigated, leading to identification of B.1.1.7 as a VOC (9)(10)(11)(12).
Second, we did not find a higher B.1.526-related IFR among younger age groups (those under 45 years); this finding is consistent with the findings of Thompson et al. (4) based on analysis of all sequenced cases, the majority of whom (67%) were under 45 years. However, for older ages, we found substantially higher B.1.526related IFRs (e.g., >60% higher for those above 65 years). This latter finding appears to be consistent with the report by Annavajhala et al. (2) showing resistance of B.1.526 to therapeutic antibodies. Over the course of the pandemic, SARS-CoV-2 IFR has decreased substantially (about a threefold difference between the two pandemic waves), likely due to improved medical treatments (e.g., therapeutic antibodies), better patient management, and earlier diagnosis. As older adults are more likely to suffer from severe COVID-19 and thus receive therapeutic antibodies (13,14), the resistance of B.1.526 may render these treatments ineffective despite their prior success against other variants, leading to increases in IFR among older adults. These findings highlight the importance of monitoring the efficacy of therapeutics against different variants and timely update of treatments. In addition, a better understanding of factors contributing to the higher IFRs in certain variants is warranted to inform countermeasures (15,16).
Last, our analyses suggest that both B.1.526 and B.1.1.7 had likely been spreading in the population for weeks or months before detection by the surveillance system (2,3,17). Expanding genomic sequencing programs for SARS-CoV-2 and improving linkage to epidemiologic data can improve detection of new VOIs/VOCs. Such efforts are underway (e.g., in the United States) but more efforts and resources are urgently needed globally. In addition, to support more timely detection and control, targeted screening of key subpopulations (e.g., those prospectively identified from modeling as having high transmission rates) and viral traits [e.g., mutations linked to increased transmissibility and/or immune evasion as done in Annavajhala et al. (2)] is needed, as well as timely sharing of key information globally. The documentation of new VOIs/VOCs anywhere in the world should then prompt preparedness measures to detect and rapidly respond to the introduction of those variants into local areas. More fundamentally, to limit emergence of new VOIs/VOCs and end the COVID-19 pandemic, all populations worldwide should have timely access to vaccination, and multiple layers of mitigation efforts are needed until a sufficient portion of the population is protected by vaccination.
Our study also has several limitations. First, most of our analyses are based on population-level data without variant-specific information, given limited variant testing during most of the study period. We circumvented this data deficiency by analyzing estimates of a key subpopulation (e.g., the WHts neighborhood where B.1.526 was initially detected) and leveraging prior knowledge (e.g., estimated IFR before B.1.526 emergence). Second, our study did not distinguish the two subclades within the B.1.526 lineage, one containing the E484K mutation and the other containing the S477N mutation. Both the E484K and S477N mutations have been shown to mediate immune escape (16,(18)(19)(20); in addition, the percentages of these two subclades were similar during our study period, suggesting that they likely have similar epidemiological characteristics. Third, while it is likely that the emergence of B.1.526 led to the increase in transmission rate in WHts at the time, we cannot rule out the possibility that other factors contributed to this increase and in turn the emergence of B.1.526. Last, there is a likely larger uncertainty in B.1.526-related and B.1.1.7-related IFR estimates for younger ages (those under 45), due to the smaller number of deaths and larger uncertainty in baseline IFR estimates. Future investigation addressing these issues is warranted should a large sample of variant-specific data become available. In summary, our study has reconstructed the early epidemic trajectory and subsequent rise of B.1.526 in NYC and estimated its key epidemiological properties. Findings highlight the importance of monitoring the viral diversity of SARS-CoV-2, epidemiological characteristics of new variants, and disease severity, as COVID-19 remains a global public health threat.

Study design and data
This study included three interconnected modeling analyses, synthesizing nine epidemiological and population datasets (Fig. 1). The first analysis applied a network model-inference system to reconstruct underlying SARS-CoV-2 transmission dynamics in NYC, accounting for underdetection of infection; it also enabled estimation of key population variables and parameters (e.g., the infection rate including those not detected as cases and transmission rate). The second analysis applied a city-level multivariant, age-structured model to simulate and estimate the changes in transmissibility and immune escape potential for B.1.526 based on the network model-inference estimates and additional data (e.g., variant prevalence data). The last analysis used estimates from the first two model systems to estimate variant-specific IFR (i.e., the fraction of all persons with SARS-CoV-2 infection who died from the disease), for B.1.526 and B.1.1.7, separately. This study was classified as public health surveillance and exempt from ethical review and informed consent by the Institutional Review Boards of both Columbia University and NYC Department of Health and Mental Hygiene (DOHMH).
For the network model-inference system, we used multiple sources of epidemiological data, including confirmed and probable COVID-19 cases, emergency department (ED) visits, and deaths, as well as vaccination data. As done previously (6), we aggregated all COVID-19 confirmed and probable cases (21,22) and deaths (22) reported to the NYC DOHMH by age group (<1-, 1-to 4-, 5-to 14-, 15-to 24-, 25-to 44-, 45-to 64-, 65-to 74-, and 75+-year-olds), neighborhood of residence [42 United Hospital Fund neighborhoods in NYC (23)], and week of occurrence (i.e., week of diagnosis for cases or week of death for decedents). COVID-19-related ED visit data were obtained from the NYC syndromic surveillance system, composed of all 53 hospital EDs in the city (24). This system identified individuals presenting at the EDs with COVID-19-like illness (CLI; defined as having a fever and cough or sore throat or respiratory illness, or pneumonia, or a COVID-19 discharge diagnosis code, excluding those with a discharge diagnosis code of influenza only); in addition, patients with CLI were matched to electronic laboratory reports of SARS-CoV-2 tests with diagnosis date within ±7 days of ED visit. We estimated the number of COVID-19-related ED visits as the number classified as CLI multiplied by the percentage of those who tested positive for SARS-CoV-2 RNA, stratified by the same age and neighborhood groups in weekly intervals. To account for the impact of vaccination, we also included COVID-19 vaccination data (partially and fully vaccinated, separately), aggregated to the same age/neighborhood strata by week.
In addition, as in our previous study (6), we used mobility data from SafeGraph (25) to model changes in SARS-CoV-2 transmission rate due to nonpharmaceutical interventions. These data were aggregated to the neighborhood level by week without age stratification. For the multivariant model analysis, we additionally used four city-level, weekly datasets: (i) COVID-19 confirmed and probable cases, (ii) hospitalizations (26), (iii) deaths, and (iv) the percentage of different variants in NYC based on genomic sequencing of samples submitted to the NYC DOHMH Public Health Laboratory and Pandemic Response Laboratory (4,27). The additional hospitalization and variant percentage data were published by the NYC DOHMH (26,27) and accessed on 22 June 2021. We used the variant data from the week starting 31 Jan 2021 to the week starting 25 April 2021 in this analysis, because earlier weeks had very low sample sizes (<200 samples sequenced per week).

Network model-inference system
The network model-inference system used here is similar to the approach described in Yang et al. (6); however, here we further accounted for waning immunity and vaccination and additionally used COVID-19-related ED visit data for model optimization. Briefly, the model-inference system uses an epidemic model (Eq. 1) to simulate the transmission of SARS-CoV-2 by age group and neighborhood, under implemented public health interventions and mass vaccination when vaccines became available starting 14 December 2020 where S i , E i , I i , R i , and N i are the number of susceptible, exposed (but not yet infectious), infectious, and removed (i.e., recovered or immune) individuals and the total population, respectively, from a given age group in neighborhood i.  city is the average citywide transmission rate; b s is the estimated seasonal trend (6). The term b i represents the neighborhood-level transmission rate relative to the city average. The term m ij represents the changes in contact rate in each neighborhood (for i = j) or spatial transmission from neighborhood j to i (for i ≠ j) and was computed on the basis of the mobility data (6). Z, D, and L are the latency period, infectious period, and immunity period, respectively. The term v i,1 represents the number of individuals in neighborhood i successfully immunized after the first dose of the vaccine and is computed using vaccination data and vaccine efficacy (VE) for first dose; v i,2 is the additional number of individuals successfully immunized after the second vaccine dose (excluding those successfully immunized after the first dose). Because 97% of vaccine doses administered in NYC during our study period (through 30 April 2021) were the Pfizer-BioNTech or Moderna vaccines, we assumed a VE of 85% 14 days after the first dose and 95% 7 days after the second dose based on clinical trials and real-word data (28)(29)(30).
Using the model-simulated number of infections occurring each day, we further computed the number of cases, ED visits, and deaths each week to match with the observations (6). Similar to the procedure for cases and deaths described in Yang et al. (6), to compute the number of ED visits, we multiplied the model-simulated number of new infections per day by the ED consultation rate (i.e., the fraction of model-simulated persons with new SARS-CoV-2 infections presenting at the EDs) and further distribute these estimates in time per a distribution of time from infection to ED consultation (table S1); we then aggregated the daily lagged, simulated estimates to weekly totals for model inference.
Each week, the system uses the ensemble adjustment Kalman filter (EAKF) (31) to compute the posterior estimates of model state variables and parameters based on the model (prior) estimates and observed case, ED visit, and mortality data per the Bayes' rule (6). In particular, using this model inference, we estimated the citywide transmission rate ( city ), neighborhood relative transmission rate (b i ), and IFR by age group for each week, from the week starting 1 March 2020 (i.e., the beginning of the COVID-19 pandemic in NYC) to the week starting 25 April 2021.

Multivariant, age-structured model
Because of model complexity, the model-inference system described above does not account for the circulation of different variants. To model variants, we used a city-level multivariant, age-structured model (32), per Eq. 2: Model variables and parameters in Eq. 2 are similar to those in Eq. 1 with the same symbols. For instance,  i is the transmission rate for variant i. However, instead of modeling the spatial structure, Eq. 2 focuses on the interactions among different variants (indicated by the subscript i) and age structure (indicated by the superscript a or A). For age structure, infection in age group A by variant i comes from all age groups, per the summation b s m ∑ a N a (see the second line in Eq. 2). For variant interactions via cross-immunity, we use a status-based construct similar to Yang et al. (33) and Gog and Grenfell (34). Specifically, c ij measures the strength of cross-immunity to variant i conferred by infection of variant j (e.g., close to 0 if it is weak and c ii = 1 for infection by the same variant). To compute the depletion of susceptibility to variant i due to infection of variant j (i ≠ j), we multiply that infection by c ij ; that is, nonspecific immunity is scaled by the strength of cross-immunity c ij . As such, the double summation ∑ j b s mc ij ∑ a  j Aa S j A I j a _ N a in line 1 of Eq. 2 represents the depletion of susceptibility due to variant-specific infection (when i = j) and nonspecific infections (for all i ≠ j). The vaccination model components v i, 1 A and v i,2 A are also variant specific and can additionally account for the reduction in VE against the new variants if needed; however, here we used the same VE estimates for all variants included (i.e., B.1.526, B.1.1.7, B.1.427, and B.1.429) based on observations (4,30,35,36). In addition, the term  i represents travel-related importation of infections of variant i (see table S2).
We restricted this simulation to November 2020 to April 2021 (i.e., from the initial identification of B.1.526 to before the further detection and increase of other variants such as Gamma and Delta). In addition to preexisting variants of SARS-CoV-2 before November 2020 (i.e., "non-VOC/VOI variants" for simplicity) and B.  (27).
Initial analysis based on the model-inference estimates suggested that B.1.526 was around 20% more infectious than non-VOC/VOI variants, without accounting for changes in immunity due to potential immune escape (see details in Results). Therefore, in this analysis, we tested combinations of change in transmissibility ranging from 10 to 30% increases and immune escape ranging from 0 to 30%, both with a 5% increment and ±5% intervals (35 combinations in total). For instance, for the combination centering at 10% transmissibility increase and 0% immune escape, the model is initialized using values in the range of 5 to 15% (i.e., 10 ± 5%) transmissibility increase and 0 to 5% (i.e., 0 ± 5% and setting negatives to 0) immune escape. In addition, because of uncertainty on the initial prevalence, we tested three different levels of initial seeding for the week starting 1 November 2020, i.e., low (0.5 to 2.5%), high (1.5 to 3.5%), and wider range (0.5 to 3.5%). For reference, WHts, which is the neighborhood where the first patients identified with B.1.526 resided and sought care, constituted 3.2% of the NYC population in 2018. We initialized the model using the model-inference estimates (e.g., population susceptibility and transmission rates by age group; table S2) and ran the model for each parameter combination 10 times, each with 1000 realizations to account for model stochasticity. Results are summarized from the 10,000 model realizations.
To identify the most plausible combination of transmissibility and immune escape properties for B.1.526, we compared the modelestimated weekly number of cases, hospitalization, and deaths, as well as the percentage of the variants to available data. Evaluation was made based on (i) accuracy, i.e., if the observation falls within the model-estimated interquartile range, it is deemed accurate; (ii) relative root mean square error (RMSE) between the observed and the model estimated; and (iii) Pearson correlation between the two time series. Because results show that model accuracy and relative RMSE had a wider spread among the combinations tested (i.e., more distinctive), we first subset those having accuracy within the highest 25th percentile and relative RMSE within the lowest 25th percentile (2 to 4 of 35 combinations remained for each setting of initial prevalence); we then selected the one with the highest correlation in the subset as the best-performing and most plausible combination.

Estimating the changes in IFR due to B.1.526
The network model-inference system enables estimation of the IFR by age group over time. These estimates are made combining all variants and do not distinguish by variant. However, we reasoned that the combined IFR is a weighted average of individual, variant-specific estimates given the relative prevalence of each variant. Accordingly, we built two linear regression models to estimate the variant-specific IFR. Model 1 restricted the analysis to November 2020 to January 2021 (when the relative prevalence of B.  (4) In both models, IFR combined is the overall IFR for each week, estimated using the model-inference system; i B.1.526 , i B.1.1.7 , and i others are the percentage of infection by the corresponding variant for each week, estimated using the multivariant, age-structured model with the most plausible parameter combination as data are not available. IFR baseline is the baseline IFR for the preexisting variants, set to the average of model-inference estimates over the period of October and November 2020 (i.e., before the increase of the new variants). The variant-specific IFRs, IFR B.1.526 and IFR B.1.1.7 , are then estimated using the regression models (n = 14 weekly data points for model 1, and n = 26 weekly data points for model 2). For either model, the change in IFR due to a given variant is then computed as ∆ IFR = IFR variant − IFR baseline ───────────── IFR baseline × 100% Both model analyses were performed for each age group or all ages combined, separately; we also combined all those aged under 25 as the IFRs were similarly low for the four sub-age groups (i.e., <1, 1-to 4-, 5-to 14-, and 15-to 24-year-olds).