Excess burden of respiratory and abdominal conditions following COVID-19 infections during the ancestral and Delta variant periods in the United States: An EHR-based cohort study from the RECOVER Program

Importance: The frequency and characteristics of post-acute sequelae of SARS-CoV-2 infection (PASC) may vary by SARS-CoV-2 variant. Objective: To characterize PASC-related conditions among individuals likely infected by the ancestral strain in 2020 and individuals likely infected by the Delta variant in 2021. Design: Retrospective cohort study of electronic medical record data for approximately 27 million patients from March 1, 2020-November 30, 2021. Setting: Healthcare facilities in New York and Florida. Participants: Patients who were at least 20 years old and had diagnosis codes that included at least one SARS-CoV-2 viral test during the study period. Exposure: Laboratory-confirmed COVID-19 infection, classified by the most common variant prevalent in those regions at the time. Main Outcome(s) and Measure(s): Relative risk (estimated by adjusted hazard ratio [aHR]) and absolute risk difference (estimated by adjusted excess burden) of new conditions, defined as new documentation of symptoms or diagnoses, in persons between 31–180 days after a positive COVID-19 test compared to persons with only negative tests during the 31–180 days after the last negative test. Results: We analyzed data from 560,752 patients. The median age was 57 years; 60.3% were female, 20.0% non-Hispanic Black, and 19.6% Hispanic. During the study period, 57,616 patients had a positive SARS-CoV-2 test; 503,136 did not. For infections during the ancestral strain period, pulmonary fibrosis, edema (excess fluid), and inflammation had the largest aHR, comparing those with a positive test to those with a negative test, (aHR 2.32 [95% CI 2.09 2.57]), and dyspnea (shortness of breath) carried the largest excess burden (47.6 more cases per 1,000 persons). For infections during the Delta period, pulmonary embolism had the largest aHR comparing those with a positive test to a negative test (aHR 2.18 [95% CI 1.57, 3.01]), and abdominal pain carried the largest excess burden (85.3 more cases per 1,000 persons). Conclusions and Relevance: We documented a substantial relative risk of pulmonary embolism and large absolute risk difference of abdomen-related symptoms after SARS-CoV-2 infection during the Delta variant period. As new SARS-CoV-2 variants emerge, researchers and clinicians should monitor patients for changing symptoms and conditions that develop after infection.

BACKGROUND SARS-CoV-2 virus may cause persistent symptoms, exacerbations of existing conditions, or onset of new diseases in the weeks to months after initial infection. 1 These symptoms and conditions are generally referred to as "post-acute sequelae of SARS-CoV-2 infection" (PASC) in the medical literature and "long COVID" in the lay press, and defined as ongoing, relapsing, or new symptoms, or other health effects occurring after the acute phase of SARS-CoV-2 infection (i.e., present four or more weeks after the acute infection). Studies have produced markedly varying estimates of PASC, which may be due to differences in methods or due to true differences in the populations studied, severity of illness, viral genotype, or dose of virus that caused infection. One meta-analysis estimated that, globally, 49% of people report persistent symptoms 120 days after infection with an increased frequency in persons who are female or required hospitalization. 2 Since the emergence of SARS-CoV-2 in 2019, mutations in the viral nucleic acid sequence have changed the transmissibility, virulence, and immunogenicity of the virus. 3 The World Health Organization (WHO) determines whether a new genotype has phenotypic characteristics that impact public health sufficiently to be classified as a variant of concern (VOC). In 2020, the US experienced a COVID-19 "wave" (generally considered a marked increase in infections, hospitalizations, and deaths) due to the ancestral strain and subsequent waves at the end of 2020 due to a mix of the ancestral strain and the alpha variant, then again in 2021 due to the Delta variant. Analyses suggest the Delta variant was more transmissible than the ancestral strain, but not necessarily more likely to cause severe illness and death. 4 PASC may result from some combination of persistent viral infection, an exaggerated immune response to initial infection, and tissue damage from the combination of initial infection and immune response. 5 It is, therefore, possible that the frequency or characteristics of PASC may vary depending on infection with different VOCs.
We analyzed data from a large database of electronic health records (EHR) in the United States to evaluate differences in PASC among those infected during the COVID-19 wave caused by the ancestral lineage in 2020 and those infected during the COVID-19 wave caused by the Delta variant in 2021.

Cohort Enrollment and Follow-Up
We conducted a retrospective cohort study of approximately 27 million people receiving medical care in New York and Florida in the United States from March 1, 2020, to November 30, 2021. Patient records were obtained from two large clinical research networks within PCORnet, the National Patient Centered Clinical Research Network 6 : INSIGHT, which contains records from approximately 12 million persons who received services across five health systems in the New York City (NYC) metropolitan area 7 , and OneFlorida+, which contains records from approximately 15 million persons receiving services across 13 health systems in Florida. 8 Data elements from these databases are maintained on the PCORnet Common Data Model and mapped to the Observational Medical Outcomes Partnership Common Data Model for interoperability. Both networks receive data monthly from each in-network facility and create an integrated dataset of all encounters, diagnoses, procedures, medications, vitals, and social history.
In this analysis, patients were included if they were at least 20 years old and had at least one diagnosis code during a baseline period, and the follow-up period included at least one SARS-CoV-2 viral diagnostic test (antigen or molecular). The baseline period was defined as three years to seven days before the date of the first documented positive or negative SARS-CoV-2 test (referred to as the index date) for the infected group or the negative group, respectively. We required that patients in the negative group had all negative SARS-CoV-2 viral diagnostic tests and no COVID-19-related diagnoses. Requiring that patients had a least one diagnosis during this baseline period helped ensure that they were connected to the healthcare system and could have been diagnosed during baseline for relevant symptoms and conditions. The follow-up period was defined as 31 to 180 days after the index date . 17

COVID-19 Variants
Our dataset did not include data about the genotype of the virus that infected each person with COVID-19. Instead, we defined the beginning and end of each variant wave using COVID-19 test data from the networks [Supplementary Figure S1] and assumed that all infections occurring during those waves were attributable to the most common genotype prevalent in those regions at the time according to data from CDC. 9 We defined the ancestral strain period as March 1 -September 30, 2020, and the Delta variant period as June 1-November 30, 2021. For this analysis, we did not analyze the winter 2020-2021 period, because multiple genotypes were circulating. 10 Data Analysis We investigated the following likely PASC categories based on prior analyses 17 , including anemia, thromboembolism, pulmonary embolism, dementia, pulmonary fibrosis, edema, and inflammation, pressure ulcer, diabetes mellitus, malnutrition, fluid disorders, U099/B948, encephalopathy, abnormal heartbeat, chest pain, abdominal pain, constipation, joint pain, cognitive problems, headache, sleep disorders, dyspnea, acute pharyngitis, hair loss, edema, fever, malaise, and fatigue. 17 Each condition was defined based on the Clinical Classifications Software Refined (CCSR) v2022.1. Codes that could not be attributed to COVID-19 were removed by clinicians.
We analyzed the risks of newly incident conditions, defined as new documentation of the above mentioned PASC categories in the follow-up period that were not present in the baseline period. Specifically, we compared adjusted hazard ratios (aHR) and adjusted excess burdens of these events occurring 31-180 days after the index date between the SARS-CoV-2 positive group and negative group. For each potential PASC condition, aHR was estimated by a Cox proportional hazard model, and excess burden was defined as the difference in cumulative incidence per 1,000 patients in the positive group and negative group over the follow-up period. For example, an excess burden of 40 for symptom X indicates there were 40 more people per 1,000 with symptom X after COVID-19 infection compared with people not infected with COVID-19. We estimated cumulative incidence by the Aalen-Johansen model 11 considering death to be a competing risk for target outcomes. We adjusted for a wide range of baseline covariates by stabilized inverse propensity score re-weighting. 12 The standardized mean difference (SMD) was used to quantify the goodness-of-balance of covariates after reweighting. We considered SMD < 0.1 as being balanced in terms of each covariate and required all covariates to be balanced after re-weighting. Both the aHR and excess burden calculations used the same covariates for adjustment.
Baseline covariates included age, gender, race, and ethnicity. The national-level area deprivation index (ADI) was used to assess socioeconomic disadvantage of patients. 13 We imputed a missing ADI value with median ADI per site. Healthcare utilization was measured as the number of inpatients, outpatient, and emergency encounters (0, 1-2, 3-4, 5 or more visits for each encounter type). The Body Mass Index (BMI) was categorized according to WHO guidelines. We adopted a tailored list of the Elixhauser comorbidities and related drug categories (e.g., corticosteroid and immunosuppressant prescriptions) to capture comorbidities. 14 Patients were defined as having comorbidity if they had at least two corresponding diagnoses documented during the baseline period.
We reported PASC conditions if they had: adjusted hazard ratio > 1; P-value <3.6 x 10 -4 (corrected by Bonferroni method to control for false discovery) in multiple test settings; and at least 100 patients with the condition. We reported adjusted hazard ratios with a 95% confidence interval. We used Python 3.9, python package lifelines-0.2666 for survival analysis and scikit-learn-0.2318 for machine learning models. Code is available at https://github.com/calvin-zcx/pasc_phenotype.

Ethical Review
The use of the INSIGHT data was approved by the Institutional Review Boards (IRB) of Weill Cornell Medicine and University of Florida.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

Demographics
Our final dataset included data from 560,752 patients [ Figure 1]. Further information about co-morbid conditions of both cases and controls over the different periods is in Table 1.
PASC associated with ancestral strain period We found an increased risk of multiple symptoms and conditions from ancestral strain infections. As shown in Figure 2, the largest adjusted hazard ratios, comparing patients with positive vs. negative were for: pulmonary fibrosis, edema, and inflammation ( Figure 3, the largest excess burdens were for: dyspnea (47.6 more cases per 1000 persons), pulmonary fibrosis, edema, and inflammation (21.5), malaise and fatigue (18.2), edema (17.6), chest pain (16.9), abnormal heartbeat (15.4), cognitive problems (12.8), and joint pain (11.5).
PASC associated with Delta variant period The spectrum of PASC-related symptoms and conditions from Delta infections varied from ancestral strain infections. For the Delta variant period, the largest adjusted hazard ratios, comparing patients with positive vs. negative tests [ Figure 2]   is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

DISCUSSION
In this study of more than half a million patients, we documented an increased burden of abdomen-related conditions during the Delta variant period and a substantial burden of postacute pulmonary-related conditions in people infected with COVID-19 during both the ancestral strain and Delta variant periods.
Our statistical approach considered both the relative risk of a new condition, expressed as a hazard ratio, and the risk difference, expressed as an excess burden. The risk difference provides a measure of public health impact, because it estimates how much each symptom or condition could potentially be reduced if patients had not been infected with COVID-19 ancestral or Delta variants. We estimate that, for every 1,000 patients, there were an additional 85 persons with abdominal pain after Delta variant infection than in those without documentation of Delta variant infection. The excess burden was, notably, not found during the ancestral strain period. Since the beginning of the pandemic, clinicians have noted that patients may present with pronounced gastrointestinal symptoms, possibly due to direct infection, alteration of the gut microbiome, or enhanced immune response. 15,16 Whether these same mechanisms explain post-acute sequelae is unclear. Further research is needed to understand the long-term prognosis of PASC-related abdominal pain and to quantify its excess burden, given that international experts did not include gastrointestinal conditions in its core outcome set for evaluating patients. 17 The excess burden of pulmonary-related conditions was large and markedly increased for the Delta variant period. We found a 78% increase in post-acute dyspnea from the ancestral strain to Delta variant periods (47.6 additional persons vs. 84.8 additional persons), even though the excess burden of diagnosed conditions that could explain dyspnea (pulmonary fibrosis, edema, and inflammation) was similar (21.5 vs. 21.0) during both periods. We do not know whether the increase in burden during the Delta variant period represents a specific change in the virus or some other factor, such as the types of persons infected, the dose, duration, or route of exposure, or increased awareness and documentation by providers. In the absence of hypoxemia, no current treatment exists for persistent dyspnea, although novel strategies to help patients, such as breathlessness training, are being evaluated. 18 Notably, the largest relative risk was associated with pulmonary embolism, a well-documented COVID-19 complication that could also cause persistent dyspnea. 19 Our study is subject to important limitations. First, we may have misclassified patients as not infected with COVID-19 because a test was never performed or not recorded. This may have been more likely during follow up of the post-Delta wave, as the follow up period overlapped with increasing availability of home testing. Such misclassification would likely lead us to underestimate the prevalence of PASC, particularly during the first wave when diagnostic testing was less widely available. It is also possible that persons who did not test positive had other . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 23, 2023.
infections, which could produce illnesses similar to PASC. 20 Further misclassification could have occurred, because we assumed all infections during a wave were attributable to the most prevalent variant circulating at that time. Second, we were not able to obtain vaccination information, which is important for further research given our finding that the burden of PASC was high during the Delta wave, despite widespread availability of vaccines at that time. 21 Third, the number of cases during the Delta variant period was substantially higher in Florida, and, for several conditions, the magnitude of excess burden varied between NY and Florida. While we adjusted for many characteristics, other unmeasured factors could explain differences. Finally, some conditions, such as pressure ulcers, may be attributable to prolonged hospitalization, rather than infection itself.
In conclusion, we found that conditions associated with PASC vary by viral variant. As the virus continues to evolve new variants rapidly, researchers and clinicians need to monitor for changing symptoms and conditions associated with COVID-19 infection. From the perspective of patients, physicians need to be aware that PASC may present differently in the future as new variants emerge and that treatment efficacy may vary by the variant that initially caused PASC.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 23, 2023. ;

ACKNOWLEDGEMENT
This study is part of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, which seeks to understand, treat, and prevent the post-acute sequelae of SARS-CoV-2 infection (PASC). This research was funded by the National Institutes of Health (NIH) Agreement OTA OT2HL161847 as part of the Researching COVID to Enhance Recovery (RECOVER) research program.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

FIGURES AND TABLES
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 23, 2023. ; . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 23, 2023. ;

Fig. 2. Adjusted hazard ratios of likely incident PASC conditions in All time (March 2020 to November 2021) versus Ancestral Strain Period (March 2020 to September 2020) versus Delta
Variant Period (June 2021 to November 2021). Sequelae outcomes were ascertained from day 30 after the SARS-CoV-2 infection and the adjusted hazard ratio were computed at 180 days after the SARS-CoV-2 infection.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 23, 2023.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 23, 2023. ;