Wastewater and seroprevalence for pandemic preparedness: variant analysis, vaccination effect, and hospitalization forecasting for SARS-CoV-2, in Jefferson County, Kentucky

Despite wide scale assessments, it remains unclear how large-scale SARS-CoV-2 vaccination affected the wastewater concentration of the virus or the overall disease burden as measured by hospitalization rates. We used weekly SARS-CoV-2 wastewater concentration with a stratified random sampling of seroprevalence, and linked vaccination and hospitalization data, from April 2021–August 2021 in Jefferson County, Kentucky (USA). Our susceptible (S), vaccinated (V), variant-specific infected I1 and I2, recovered (R), and seropositive (T) model SVI2RT tracked prevalence longitudinally. This was related to wastewater concentration. The 64% county vaccination rate translated into about 61% decrease in SARS-CoV-2 incidence. The estimated effect of SARS-CoV-2 Delta variant emergence was a 24-fold increase of infection counts, which corresponded to an over 9-fold increase in wastewater concentration. Hospitalization burden and wastewater concentration had the strongest correlation (r = 0.95) at 1 week lag. Our study underscores the importance of continued environmental surveillance post-vaccine and provides a proof-of-concept for environmental epidemiology monitoring of infectious disease for future pandemic preparedness.


Introduction 103
There is an increasing realization that the current methods of disease monitoring based on 104 individual testing may be insufficient to effectively combat the new, possibly much more 105 infectious, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants. This leaves 106 public health researchers and policy makers in search for more reliable methods of measuring 107 SARS-CoV-2 prevalence in communities and especially those not involving the (expensive) 108 process of collecting individual level data. Wastewater concentration, when properly calibrated, 109 can be a surrogate for the virus community prevalence analysis. 1,2 Wastewater epidemiology 110 promises an exciting opportunity to estimate community disease prevalence even with 111 asymptomatic, vaccine preventable, disease. 2,3 However, the handful of recent studies 112 considering a relationship between SARS-CoV-2 wastewater concentration and the COVID 19 113 vaccine have relied almost exclusively on statistical models calibrated with publicly available 114 COVID 19 clinical case data. 4-8 These data run the risk of biased underrepresentation of 115 asymptomatic individuals who may not seek testing, or individuals testing in settings where 116 reporting is low or not required. 9 In this study we consider this question in the context of 117 randomized seroprevalence surveillance, combining mechanistic and statistical frameworks to 118 obtain a more robust and realistic answer. 119 120 We used repeated cross-sectional community-wide stratified randomized sampling to measure 121 SARS-CoV-2 nucleocapsid (N) specific IgG antibody-based seroprevalence in Jefferson County, 122 Kentucky (USA), from April through August 2021 to determine post-vaccine community 123 prevalence at a sub-population scale. We then related this to a statistical linear model and the 124 available sub-population weekly wastewater surveillance data which thus yielded an explicit 125 impact of vaccination and seroimmunity on SARS-CoV-2 wastewater concentration estimate, 126 while controlling for prevalence in different epidemic phases. The latter may be easily translated 127 into other important public health indicators such as the patterns of hospitalization. 128 129 2. Methods 130 2.1 Seroprevalence 131 Community-wide stratified randomized seroprevalence sampling was conducted in four waves 132 from April to August 2021 in Jefferson County, Kentucky (USA) which is also the consolidated 133 government for the city of Louisville. Seroprevalence sampling was both before and during 134 vaccination, but this analysis only considers the period after COVID-19 vaccines were made 135 widely available to the public (N=3436). An address-based sampling frame was used to build 136 four geographic zones. Invitations (~30,000 per wave) were mailed to sampled households and 137 one random adult was selected to join the study. Participants completed an online consent form 138 and survey and scheduled an in-person appointment for testing at a mobile site. In some cases, 139 due to the timing of sampling waves, respondents may have had only the first of a two-dose 140 vaccine series. Owing to elevated levels of vaccinated respondents in our study (~90%), 141 seroprevalence was measured by response to IgG N antibodies. 10 It was assumed over the study 142 period vaccination induced antibodies do not decay below detection. The nucleocapsid (N) IgG 143 test sensitivity was 65% and the specificity was 85%. The seroprevalence sampling by 144 geographical zones are described in more detail in the Supplemental Material section S1. 145 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Wastewater SARS-CoV-2 concentration 146
Wastewater samples were collected twice per week from five wastewater treatment plants 147 (N=520; Supplemental Material section S2) from April to August 2021. From an influent 24-148 hour composite sampler, 125 ml of subsample was collected and analyzed for SARS-CoV-2. In a 149 few cases due to an equipment malfunction, a grab sample was collected. The geographic area 150 and population serviced by a wastewater treatment plant comprises a sewershed, the zone for 151 which we consider in our model analysis across a range of population sizes, income levels and 152 racial and ethnic diversity. 2 Analysis used polyethylene glycol (PEG) precipitation with 153 quantification in triplicate by reverse transcription polymerase chain reaction (RT-qPCR). 11  the non-informative Cauchy distribution was assigned to regression coefficients, and the non-187 informative gamma prior was assigned to the dispersion parameter error term. 188 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 7, 2023. adjusting for the Delta variant emergence ( Figure 1). The peak and the overall temporal 233 dynamics are different under the two scenarios across each location. To better quantify these 234 differences, we calculated the location-specific vaccination effects as the ratios of the areas under 235 two scenario curves (with-vaccination area over without-vaccination area). The value obtained 236 for the aggregated data was 0·429 (CI= (0·405, 1)), with the remaining sewershed specific effects 237 being even stronger at ( Figure 1; panels B-D) 0·532 (CI= (0·515, 1)), 0·367 (CI= (0·366, 0·785)), 238 and 0·555 (CI= 0·555, 1)), respectively. Based on converting these ratios to excess incidence, we 239 conclude that without vaccination, we would expect to see the incidence increase of about 133% 240 above the observed level in Jefferson County (panel A) and about 88%, 172%, and 80% in 241 respective sewershed areas ( Figure 1; panels B-D, see also S3). 242 243 To obtain estimates of the vaccination effects on the wastewater concentrations, we developed a 244 hybrid inferential model combining the wastewater regression (see Sec 3.1) equation with the 245 SVI 2 RT estimated prevalence, under two different vaccination scenarios (factual 64% rate and 246 counter-factual 0% rate) ( Figure 2). Note that the usage of SVI 2 RT (which accounts for the effect 247 of different virulence of the two different SARS-CoV-2 strains) automatically adjusted our 248 analysis for the Delta variant emergence. As the estimated prevalence from the ܵ ܸ ‫ܫ‬ ଶ ܴ ܶ model 249 and the normalized wastewater concentration are highly correlated (see Sec 3.1), the hybrid 250 model is seen to fit data well. As before, to quantify the location-specific vaccination effects, we 251 calculated the location-specific ratios under two curves in an analogous way as when quantifying 252 the vaccination effect on the disease incidence. The ratios of the areas under the two curves, 253 under factual (vaccinated) and counterfactual (unvaccinated) scenarios, were computed. The 254 Jefferson County (Figure 2; panel A) ratio was equal to 0·358 (CI= (0·333, 0·381)), and the 255 remaining sewershed area ratios (Figure 2; panels B-D) were equal to, respectively, 0·457 (CI= 256 (0·388, 0·537)), 0·276 (CI= (0·260, 0·296)), and 0·426 (CI= 0·407, 0·446)). The estimate of 257 excess wastewater virus without vaccination is estimated as 179%, 119%, 262%, and 135%, 258 respectively (Supplemental material section S3). 259 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 7, 2023. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 7, 2023. ; https://doi.org/10.1101/2023.01.06.23284260 doi: medRxiv preprint

Effects of virus mutation on disease incidence and wastewater concentration 276
The time periods during which the Alpha and Delta variants were dominant in each sewershed 277 are are shown in both Alpha-and Delta-variants present), where the incidence (dark curve) was seen to rise 295 rapidly ( Figure 3). As in the previous section, to quantify the difference between the two curves, 296 which we interpret as measuring the effect of introducing the Delta mutation, we calculated the 297 ratio of areas under the two curves in each panel, obtaining the values of 7·32 (CI = (7·05, 298 20·13)), 4·40 (CI = (4·33, 7·64)), 8·58 (CI = (1, 8·60)), and 6·15 (CI = (1, 6·16)) for the aggregate, 299 MSD1, MSD2, and MSD3-5 regions respectively (corresponding to panels A-D). The estimate 300 of the decrease in total incidence without mutation is found as 86%, 77%, 88%, and 84%, 301 respectively. 302 303 To identify the effect of the Delta variant emergence on the observed wastewater concentration, 304 we again applied the hybrid model discussed in the previous section. In the current analysis, the 305 regression model was applied to predict the longitudinal wastewater concentrations from both 306 factual (both variants present) and counterfactual prevalence data (no Delta variant). The results 307 are depicted in the panels of Figure 4 both for the aggregated and sewershed-specific analysis. 308 As with the analysis of the vaccination effects, here we also considered the ratios of areas under 309 the corresponding curves as measures of Delta variant effects in specific locations. Based on the 310 location-specific ratio values of 12·569 (CI = (11·487, 13·914)), 6·235 (CI = (5·290, 7·891)), 311 14·932 (13·351, 16·898), and 8·413 (CI = (7·654, 9·351)), corresponding to aggregated and 312 sewershed-specific curves, the estimate of excess wastewater virus due to Delta mutation is 313 founded as 92%, 84%, 93%, and 88% respectively. Further analysis is provided in Supplemental 314 Material Table S3.3. 315 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. hospitalization counts on the wastewater concentrations ( Figure 5). As hospitalization is likely to 336 occur sometime after symptom onset, we used the 1-week lagged-regression model where the 337 length of the lag was based on the overall model fit criteria. The fitted intercept and slope 338 coefficients were 1·284 x 10 -4 (std=2·279 x 10 -5 ) and 0·176 (std=0·0119) for vaccinated and 339 unvaccinated scenarios respectively, with the R-square of 0·928. The maximal number of the 340 observed daily average hospitalizations under vaccination scenario was 110·4 per weekly 341 average (actual 122·0 in daily) at the end of August. However, without vaccination, the 342 maximum predicted number of weekly average hospitalizations increased to 150·3. The ratios 343 between the areas under the prediction curves with and without vaccination were 0·368 (CI = 344 (0·413, 0·458)), indicating a 170% increase in the number of hospitalizations when no vaccine 345 would be present. In a comparable way, we obtained the hospitalization estimate without the 346 Delta variant mutation. The ratio of the areas under the two graphs (with and without the Delta 347 variant mutation) is 2·632 (CI = (2·382, 5·573)), indicating a 62% decreasing in the 348 hospitalization rate. 349 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. helped to control Alpha, while an increase in a third booster was found to lead to a decline in 407 Delta. When vaccination levels increase to higher coverage, overall reported incidence may 408 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. Our study used five sub-community scales based on the existing wastewater infrastructure 416 allowing observation of regional trends but also the aggregation of data for a countywide picture. 417 We is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. The seroprevalence, wastewater concentration, and hospitalization information data used in the 494 study can be accessed from the website https://github.com/cbskust/DSA_Seroprevalence. The 495 computer code that implemented our model-based analysis will be made available immediately 496 after publication. 497 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. ; https://doi.org/10.1101/2023.01.06.23284260 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

S3. Population vaccination model (SVI 2 RT) 589 590
The equation shown in (1) describes the time-evolution of the proportions of individuals who are 591 susceptible (S), vaccinated (V), infected with Alpha variant(I 1 ), infected with Delta variant (I 2 ) 592 removed (R), and seropositive (T). We assume the total initial population of susceptibles is large 593 with a small initial fraction of infected. The model equations are 594 having positive results, the corresponding 624 log-likelihood function is 625 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023.
where ‫‬ ሺ Θ ) is the prior distribution on Θ to be determined from our previous work. 3 Hence, we 631 seek the values of Θ that maximize our posterior log-likelihood function (3). Note the entire 632 system (1)  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023.

0·643
(0·551, 0·711) 652 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 7, 2023. Relationship among observed wastewater concentration, the hospitalization rate, and estimated 699 prevalence. The dark brown line represents the estimated prevalence, and the shaded area is the 700 95% credible interval of MCMC simulation. The green line is the weekly average of daily 701 hospitalization rate of Jefferson County, and the blue dots represent the weekly average of 702 wastewater concentrations. The Pearson correlation coefficient of estimated prevalence and 703 wastewater concentration is 0·916 (95% CI=(0·764, 0·976)) and that of hospitalization rate and 704 wastewater concentration is 0·720 (95% CI =(0·224, 0·953)). 705 35 er . ed he ily of nd nd . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 7, 2023. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 7, 2023. To obtain the linear regressions, the procedure was as follows: Let ‫ܫ‬ ሚ ௧ be the model estimated 718 percentage prevalence corresponding to the same week and sewershed area. ܹ ௧ was defined in 719 Eq. (5). The linear and NB regression models are given by: 720 In the Bayesian linear regression models, non-informative priors were assigned. Specifically, the 722 non-informative Cauchy distribution was assigned to the regression coefficients, and the non-723 informative gamma prior was assigned to the dispersion parameter of the error term. The 724 summary of the posterior estimates of all regression parameters is presented in Table S3.4, and 725 fitting and prediction using the regression model are represented in Figure 2 In this model, we changed the time lag d from 1 to 4 so that the maximum period from a shred of 737 evidence of the community spread of COVID-19 in wastewater to reach a burden to 738 hospitalization is about a month. Of note, hospitalizations data is available daily while 739 wastewater is weekly 740 741 Additionally, we performed a simulation study using this regression model how to check how 742 much the hospitalization rate changes according to the vaccination rate. We changed the 743 vaccination rate so that the vaccination percentage of the community was 0% and predicted the 744 serial estimates P r e d ௧ in Eq. (4). And then, we predicted the wastewater concentration using a 745 linear regression model and used them as the predictor in the regression model (5). 746 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 7, 2023.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 7, 2023. ; https://doi.org/10.1101/2023.01.06.23284260 doi: medRxiv preprint