Distinguishing Severe Acute Respiratory Syndrome Coronavirus 2 Persistence and Reinfection: A Retrospective Cohort Study

Abstract Background Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) reinfection is poorly understood, partly because few studies have systematically applied genomic analysis to distinguish reinfection from persistent RNA detection related to initial infection. We aimed to evaluate the characteristics of SARS-CoV-2 reinfection and persistent RNA detection using independent genomic, clinical, and laboratory assessments. Methods All individuals at a large academic medical center who underwent a SARS-CoV-2 nucleic acid amplification test (NAAT) ≥45 days after an initial positive test, with both tests between 14 March and 30 December 2020, were analyzed for potential reinfection. Inclusion criteria required having ≥2 positive NAATs collected ≥45 days apart with a cycle threshold (Ct) value <35 at repeat testing. For each included subject, likelihood of reinfection was assessed by viral genomic analysis of all available specimens with a Ct value <35, structured Ct trajectory criteria, and case-by-case review by infectious diseases physicians. Results Among 1569 individuals with repeat SARS-CoV-2 testing ≥45 days after an initial positive NAAT, 65 (4%) met cohort inclusion criteria. Viral genomic analysis characterized mutations present and was successful for 14/65 (22%) subjects. Six subjects had genomically supported reinfection, and 8 subjects had genomically supported persistent RNA detection. Compared to viral genomic analysis, clinical and laboratory assessments correctly distinguished reinfection from persistent RNA detection in 12/14 (86%) subjects but missed 2/6 (33%) genomically supported reinfections. Conclusions Despite good overall concordance with viral genomic analysis, clinical and Ct value-based assessments failed to identify 33% of genomically supported reinfections. Scaling-up genomic analysis for clinical use would improve detection of SARS-CoV-2 reinfections.

Nucleic acid amplification tests (NAATs) can detect severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA weeks to months after initial infection [1,2]. Therefore, positive NAATs obtained several weeks after an initial diagnosis pose a clinical dilemma-these results may reflect delayed clearance of non-viable virus, chronic active infection with viable virus, or SARS-CoV-2 reinfection, each of which carries different implications for treatment, contact tracing, and transmission-based precautions [3][4][5][6]. In addition to aiding clinical decision making, distinguishing reinfection from persistent RNA detection related to the initial infection is of public health importance, because reinfections may indicate viral escape due to waning immunity or signal the emergence of novel variants [5,7]. However, without viral genomic analysis, which is not yet widely available for clinical use, it is challenging to distinguish SARS-CoV-2 reinfection from persistent RNA detection using routinely available clinical and laboratory data alone [3][4][5].
Previous attempts to identify SARS-CoV-2 reinfection incidence and associated clinical characteristics have mostly relied on time-based definitions for reinfection (ie, having a repeat positive test at some interval after diagnosis), rather than genomic analysis, leading to potential misclassification of reinfection cases [2,[8][9][10]. The Centers for Disease Control and Prevention (CDC) investigative criteria for suspected cases of SARS-CoV-2 reinfection prioritizes genomic evaluation of viral specimens from individuals with a repeat positive NAAT ≥45 days after initial diagnosis to distinguish reinfection from persistent RNA detection (Supplementary Table 1) [11]. These criteria have been applied to many individual cases [12][13][14], but they have not been used to systematically identify SARS-CoV-2 reinfection in large, well-characterized cohorts.
We report clinical, laboratory, and genomic characterization of a large cohort of individuals with repeat positive NAATs ≥45 days after initial coronavirus disease 2019  diagnosis at an academic medical center in Boston. We also describe the strengths and limitations of clinical, laboratory, and genomics-based assessment for the identification of SARS-CoV-2 reinfection in the clinical setting.

Data Sources and Inclusion Criteria
We queried the Massachusetts General Hospital (MGH) electronic health record to identify subjects with at least 1 SARS-CoV-2 NAAT collected at an MGH facility ≥45 days after any prior positive NAAT (Supplementary Methods). Subjects were included in the study cohort if they had a subsequent positive NAAT collected ≥45 days after a prior positive NAAT that (1) was from a nasopharyngeal (NP) or anterior nares (AN) specimen, (2) had a cycle threshold (Ct) value <35 from any target, and (3) had residual specimen available for genomic analysis. These criteria were adapted from the CDC SARS-CoV-2 reinfection investigation criteria (Supplementary Table 1). This study was approved by the Mass General Brigham Institutional Review Board (2019P003305).

Clinical Review
We obtained demographic and clinical data regarding the initial infection episode among subjects hospitalized at MGH with COVID-19 during the first surge from 11 March 2020 to 3 June 2020 from the MGH COVID-19 Registry [15]. For these subjects' subsequent testing encounters and for all data on subjects who were not hospitalized at MGH with COVID-19 during the first surge, we extracted demographic, clinical, and laboratory data from the electronic health record into a standardized data collection form.
To assess clinical suspicion for COVID-19 reinfection, two infectious disease physician reviewers conducted detailed chart review and categorized each subject into either "low" or "moderate to high" clinical suspicion for reinfection. Reviewers were blinded to subjects' Ct values, genomic data, and other reviewers' categorizations. Features that favored "moderate to high" suspicion for reinfection included: new or recurrent upper respiratory infection symptoms, no clear alternative diagnosis, reporting a COVID-19 contact, and/or imaging and laboratory findings consistent with acute COVID-19 infection. All subjects also underwent Biofire™ extended respiratory viral testing (Supplemental Methods); however, these results were not available to reviewers unless obtained as part of clinical care. Initial clinical assessment categorizations were discordant for 6/65 (9%) subjects and were adjudicated by a third, blinded reviewer.

Ct Value Assessments
For each included subject, we reviewed all available SARS-CoV-2 NAAT results and associated Ct values. Ct values obtained from the TaqPath® COVID-19 Combo Kit were standardized to account for variability across testing platforms (Supplementary Methods and Supplementary Figure 1) [16]. Ct value assessments of reinfection were assigned to one of four categories-strongly supportive, supportive, not supportive, or inconclusive-using pre-defined criteria based on established viral kinetics of SARS-CoV-2 infection (Supplementary Table 2) [2]. We compared the yield of genomic data with specimen Ct values <30 versus ≥30 using a Fisher exact test, considering a 2-sided P value <.05 to be statistically significant.

Genomic Analysis
We performed either randomly-primed metagenomic sequencing using NexteraXT (41 samples) or tiled-amplicon sequencing with ARTICv3 primers (156 samples) on each thawed sample, sequencing the libraries as members of multiplexed pools (Supplementary Methods). Samples from most subjects did not produce complete viral genomes at all time points, which limited the ability to use only a change in PANGO lineage designation to determine reinfection. Therefore, support for reinfection was corroborated based on the following criteria, consistent with CDC guidance [11], using initial and subsequent infection period genomes: (1) well-supported phylogenetic separation (ie, change in PANGO lineage or a Shimodaira-Hasegawa likelihood ratio test node support metric >80%), (2) a higher-than-expected substitution rate for the population average (>2 substitutions/30 days), or (3) presence of a PANGO lineage during the subsequent episode that was not in local circulation (ie < 1% prevalence) during initial infection (Supplementary Methods) [11,13,17,18]. Conversely, subjects whose later viral genomes descended from earlier genomes and had substitution rates less than the population average were deemed to have persistent RNA detection (ie, no reinfection) [11].

Determining the Likelihood of Reinfection
We stratified included subjects into 1 of 5 categories. Regardless of clinical or Ct value assessments, subjects with genomic demonstration of reinfection were categorized as genomically supported reinfection. Similarly, subjects with genomic evidence of persistent RNA detection were classified as genomically supported persistent RNA detection (no reinfection). If genomic analysis was indeterminate, categorization was based on clinical and Ct value assessment criteria into probable reinfection, probable persistent RNA detection (unlikely reinfection), or inconclusive (Supplementary Table 3).

Identifying Included Subjects Using Time-based Criteria
Between 14 March 2020 and 30 December 2020, 1569 individuals had at least 1 SARS-CoV-2 NAAT collected at MGH ≥45 days after a prior positive test ( Figure 1A), (Supplementary Methods). Among individuals with repeat testing ≥45 days after their initial positive test, 195/1569 (12%) had a positive repeat test ( Figure 1B). Of these 195 subjects, 130 (67%) were excluded because their repeat tests had a Ct value that was either ≥35 (120 subjects) or not available (6 subjects), were collected from the lower respiratory tract (3 subjects), or were negative when repeated to assess study eligibility (1 subject). In total, 65/1569 (4%) subjects with repeat testing were included in the cohort.

Yield of Genomic Analysis
SARS-CoV-2 sequencing was attempted on all available positive NP or AN specimens with a Ct <35 (n = 197) from the 65 subjects in the cohort, producing 170 partial or complete viral genome assemblies (Supplementary Figure 2; Supplementary  Table 6). For 14/65 (22%) subjects, the quality of genomic data was sufficiently high to assess reinfection ( Figure 1B). The 51 remaining subjects could not be genomically classified because of specimen unavailability (16/51, 31%) or insufficient genome coverage (35/51, 69%). Specimens with Ct values >30 were less likely to yield high-quality genomic data than those with Ct ≤30 (P value <.0001, Supplementary Figures 2 and 3), consistent with previous reports [19,20].

Genomics-based Classification and Analysis
For the 14 subjects with sufficient genomic data at multiple time points, we identified 6 cases of genomically supported reinfection ( Figure 2) and 8 cases of genomically supported persistent RNA detection ( Figure 3). Of the 6 genomically supported reinfection cases, 3 (Subjects R1, R4, and R5) were classified as reinfection with moderate genomic support, as time series genome data were unavailable, but the viral PANGO lineage detected in the later timepoint had not been detected globally (R1 and R5) or was not yet in circulation in Massachusetts (R4) during the initial time period of positivity ( Figure 2), making initial infection with the lineages highly improbable. The remaining 3 cases were classified as reinfections with strong genomic support. Two cases (Subjects R3 and R6) were identified as genomically supported reinfections based on the phylogenetic separation of early and later genomes (ie, later genomes descended from other cases rather than from the earlier viral genomes); these also exhibited a greater than average nucleotide substitution rate, consistent with reinfection [11]. These 2 cases exhibited a return to the ancestral allele in later timepoints at several polymorphic positions. The last case (Subject R2) was identified based on phylogenetic separation as well as a subtle lineage change from B.1 to B.1.1. In contrast, the 8 subjects categorized as persistent RNA detection showed no genomic evidence for reinfection.
Among genomically supported reinfection cases, we identified mutations in the genomes from second infections relative to the ancestral reference genome (Supplementary Figure 4, Supplementary Table 4). Several of these mutations were nonsynonymous substitutions of potential functional relevance. Subject R1 had the Mustelidae-associated substitution S:N501T in the spike protein [21,22], 1 also reported in a prior reinfection case [13]. Subject R6 had the substitution S:Q677H near the furin cleavage site of the spike protein [23,24], a recurrent emergence on several distant lineages [25]. Subject R2 exhibited the substitutions N:R203K and N:G204R, observed to replicate well without cytopathy, subsequently seen in the Alpha, Gamma, and Omicron lineages, and found to increase viral particle production by elevating nucleocapsid expression [26,27].
We characterized viral genetic changes that occurred during infection in the 8 subjects with genomically supported persistent RNA detection (Subjects P1-P8;  Table 4) to examine viral evolution within the host. Subjects P1, P3, P4, P6, P7, and P8 exhibited viral substitutions in the middle of each time series that were not seen in the first or final timepoints, consistent with mixed viral populations and the emergence and disappearance of subclones, as previously reported [28]. The average apparent substitution rate of genomes among all subjects with genomically supported persistent RNA detection (0.9 substitutions per 30 days) was below the average rate reported for the virus across the broader population [17].

Concordance between Genomic Findings and Clinical and Ct Value Assessments
Among the 6 subjects with genomically supported reinfection, the median age was 34 years (interquartile range [IQR]: 28, 45), 5 (83%) were male, and none were immunocompromised (  Figures 2 and 5), with 3 of 6 (50%) cases satisfying both criteria. Subjects R1, R4, and R6 had mild, typical COVID-19 symptoms with the second illness episode. Subject R3 had respiratory symptoms, which were attributed to post-obstructive pneumonia, and he subsequently died of lung cancer complications. Subjects R2 and R5 were asymptomatic, but Ct values were strongly suggestive of reinfection for Subject R5. For Subjects R2 and R3, clinical and Ct value assessment alone was insufficient to identify reinfection.

Clinical and Ct Value Assessments Among Subjects Not Genomically Classified
When subjects could not be genomically classified (51/65 subjects, 78%), we used clinical and Ct value assessments to classify subjects as probable reinfection, probable persistent RNA detection (unlikely reinfection) or inconclusive (Supplementary Tables 3 and 5). Eight (12%) subjects were categorized as probable reinfection based on "moderate to high" clinical suspicion with a supportive Ct value assessment, or with a strongly supportive Ct value assessment alone (Supplementary Figure 6). Subjects with probable reinfection had a median duration of 87 days (IQR: 77, 111) between the first and repeat qualifying positive NAAT and lower median Ct values at repeat compared to initial testing ( Figure 4). Conversely, we classified 18 subjects (28%) as probable persistent RNA detection based on "low" clinical suspicion for reinfection and a Ct value assessment that was not supportive of reinfection (Supplementary Figure 7). These subjects had a median of 55 days (IQR: 50, 65) between the first and repeat qualifying positive NAATs, and higher median Ct values at repeat compared to initial testing (   Other, non-Hispanic Healthcare worker with or without known exposure, n (%) Immunocompromised, n (%) a 7 (11) 0 (0) 1 (13) 2 (11) 4 (50) 0 (0) Treatment received during initial episode, n (%) Non-COVID respiratory pathogen identified, n (%) d Patients received one or more of: hydroxychloroquine or inhaled nitric oxide. c Two subjects received corticosteroids for non-COVID-19 related indications. No other COVID-19 directed therapies were administered to subjects during their subsequent episode. inconclusive results and could not be categorized due to insufficient data or discordant clinical and Ct value assessments (Supplementary Figure 8).

DISCUSSION
As SARS-CoV-2 reinfection increases in the era of Omicron and other variants of concern [7,30], so does the importance of promptly and accurately distinguishing SARS-CoV-2 reinfection from persistent RNA detection after initial infection. Failing to recognize SARS-CoV-2 reinfection may result in missed opportunities for initiation of COVID-19 treatment, onward transmission, and delayed detection of novel SARS-CoV-2 variants [3][4][5]7]. In contrast, approaching all individuals with positive NAATs long after initial infection as having reinfection places increased pressure on strained healthcare systems [31,32] by risking misallocation of therapeutics, private hospital rooms, and personal protective equipment. Although the stakes are high, standardized approaches to distinguish reinfection from persistent RNA detection are lacking.
We systematically used clinical, laboratory, and genomic information to distinguish SARS-CoV-2 reinfection from persistent RNA detection among individuals who underwent serial SARS-CoV-2 NAATs as part of routine care. Cases of genomically supported reinfection were often mild or asymptomatic, despite generally low Ct values on repeat testing, suggesting transmissibility [33,34]. We found that a third (2/6) of individuals with genomically supported reinfection would have been misclassified with clinical and Ct value assessment alone, underscoring the challenge of distinguishing between SARS-CoV-2 reinfection and persistence using routinely available clinical and laboratory data. Conversely, clinical and Ct value assessment accurately identified all (8/8) individuals with genomically supported persistent RNA detection (ie, no reinfection).
We found that structured clinical and Ct value assessments performed reasonably well in distinguishing reinfection from persistent RNA detection, demonstrating 86% (12/14 subjects) concordance with viral genomic analysis, but these real-time assessments have some limitations. SARS-CoV-2 reinfection is frequently asymptomatic and may be missed by clinical assessment [35,36]. Low or decreasing Ct values can raise suspicion for reinfection. However, Ct values vary by time from symptom onset, testing platform, and sample collection, transport, and storage conditions [37], complicating Ct value interpretation. Despite these limitations, our findings suggest that clinicians can distinguish reinfection from persistent RNA detection with reasonable accuracy by integrating clinical and Ct value assessments with established time-based definitions of reinfection.
This study demonstrates that viral genomic analysis can improve sensitivity and specificity for detecting reinfection compared to clinical and Ct value assessment when highquality genomic data are available from multiple SARS-CoV-2 NAATs across illness episodes. Sequencing can also provide valuable information to inform both patientand population-level decision making [5]. Viral genomic analysis can detect amino acid substitutions that may alter treatment guidelines, (eg, suspension of bamlanivimab based on S:E484K prevalence [38,39]). It can detect mutations evolving within immunocompromised individuals over time due to uncontrolled and potentially transmissible infection, as in subject P1 in our study [29], and enable consideration of SARS-CoV-2 retreatment. Population-level genomic surveillance may signal emergence of new variants with phenotypic differences in disease severity or transmissibility. In our study, 3 subjects with genomically supported reinfection had notable substitutions in the spike or nucleocapsid proteins [21][22][23][24]26]. Ongoing genomic surveillance of individuals with suspected SARS-CoV-2 reinfection is important to help understand whether mutations like these promote immune escape.
Despite the potential benefits of viral genomic analysis, this study highlighted ongoing obstacles to its use for routine differentiation of SARS-CoV-2 reinfection and persistent RNA detection. First, paired specimens from multiple infection episodes suitable for genomic analysis may be unavailable. Second, recovery of full SARS-CoV-2 genomes from specimens with lower viral RNA concentrations is difficult using conventional sequencing methods [19]. We recovered sufficient genomes in less than 25% of included subjects; most subjects with successful genome recovery had Ct values <30. Third, there is currently no automated method to perform genomic analysis of potential reinfections on a timeframe fast enough to guide clinical decision making.
Overcoming these obstacles to mobilize viral genomic analysis for clinical care may be possible with substantial upfront investment. Access to sequencing and reduction in turnaround time could be achieved by embedding sequencing platforms and analyst personnel into clinical laboratories. Viral genomic yields could be improved by employing laboratory methods that reduce viral degradation (eg, minimizing dwell time and freeze-thaw cycles) or deploying shorter-amplicon sequencing [40]. Independent replicates from source material may be employed to provide additional data to enhance genome coverage and sequencing depth. This analysis has several important limitations. First, although this is among the largest SARS-CoV-2 reinfection cohort studies to combine detailed clinical and laboratory data with viral genomic evaluation, genomic analysis distinguished potential reinfection among only a modest number of subjects. Second, our cohort represents a convenience sample of individuals undergoing repeat SARS-CoV-2 NAAT as part of routine care. As such, we cannot provide a reinfection rate; however, our results demonstrate that reinfection can occur relatively soon after initial infection. Finally, this analysis was conducted before the emergence of SARS-CoV-2 variants of concern and deployment of COVID-19 vaccines and therapies. Although these factors may influence reinfection risk and associated characteristics, this study provides a framework for using viral genomic analysis alongside clinical and Ct value assessment that could be adapted as the COVID-19 pandemic evolves.

CONCLUSION
Although clinical and Ct value assessments were highly concordant with viral genomic analysis in identifying persistent RNA detection, they failed to identify a third of genomically supported SARS-CoV-2 reinfections. Scaling-up viral genomic analysis for real-time clinical use would likely improve detection of SARS-CoV-2 reinfections, allow for treatment among the eligible, and provide much needed insights regarding evolution of variants of concern.

Supplementary Data
Supplementary materials are available at Clinical Infectious Diseases online. Consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author.