Using publicly available data to identify priority communities for a SARS-CoV-2 testing intervention in a southern U.S. state

Background: The U.S. Southeast has a high burden of SARS-CoV-2 infections and COVID-19 disease. We used public data sources and community engagement to prioritize county selections for a precision population health intervention to promote a SARS-CoV-2 testing intervention in rural Alabama during October 2020 and March 2021. Methods: We modeled factors associated with county-level SARS-CoV-2 percent positivity using covariates thought to associate with SARS-CoV-2 acquisition risk, disease severity, and risk mitigation practices. Descriptive epidemiologic data were presented to scientific and community advisory boards to prioritize counties for a testing intervention. Results: In October 2020, SARS-CoV-2 percent positivity was not associated with any modeled factors. In March 2021, premature death rate (aRR 1.16, 95% CI 1.07, 1.25), percent Black residents (aRR 1.00, 95% CI 1.00, 1.01), preventable hospitalizations (aRR 1.03, 95% CI 1.00, 1.06), and proportion of smokers (aRR 0.231, 95% CI 0.10, 0.55) were associated with average SARS-CoV-2 percent positivity. We then ranked counties based on percent positivity, case fatality, case rates, and number of testing sites using individual variables and factor scores. Top ranking counties identified through factor analysis and univariate associations were provided to community partners who considered ongoing efforts and strength of community partnerships to promote testing to inform intervention. Conclusions: The dynamic nature of SARS-CoV-2 proved challenging for a modelling approach to inform a precision population health intervention at the county level. Epidemiological data allowed for engagement of community stakeholders implementing testing. As data sources and analytic capacities expand, engaging communities in data interpretation is vital to address diseases locally.


66
From the outset of the global pandemic, the burden of COVID-19 disease has been dynamic 67 SARS-CoV-2 testing data 111 Data were downloaded periodically (approximately quarterly) from publicly available 112 websites that collated and created visual representations of data reported by the Alabama 113 Department of Public Health (ADPH). For the October 2020 county testing selections, data were 114 downloaded from bamatracker.com [17], a website that collated and displayed data from ADPH 115 through May 2021. In the March 2021 round of selections, data were downloaded from 116 bamatracker.com [13] as well as the New York Times website [18]. 117 Summary models reviewed 118 We reviewed existing models summarizing, modeling, and predicting patterns in SARS-119 CoV-2 testing data provided by Johns Hopkins University [19], University of Washington 120 (Institute for Health Metrics and Evaluation, IHME) [20], and the Pandemic Vulnerability Index 121 (PVI) from National Institute of Environmental Health Sciences [21]. We reviewed the COVID 122 Health Equity interactive dashboard summarized by Emory University [22]. None of these were 123 able to distinguish testing intervention need in Alabama at the county level as the entire state was 124 in highest risk category for COVID-19 disease with widespread need for increased SARS-CoV-2 125 testing in both rounds of selections. However, elements of data sources used in these models and 126 collated on their websites were incorporated as described in model covariates below. 127 Model building using local data 128 Outcome 129 Percent positivity, or the number of positive tests per 100 tests performed, has been 130 widely used as a measure of SARS-CoV-2 disease burden since the beginning of the pandemic, 131 providing insights into transmission within specific geographic areas [23]. The CDC recommends 132 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 percent positivity as a measure of disease surveillance for public health decision-making [23]. We 133 modeled factors associated with SARS-CoV-2 percent positivity with the goal to identify factors 134 associated with higher percent positivity to inform intervention county selections. At the time of 135 these selections, most Alabama counties reported high percent positivity rates and were 136 uniformly identified as highest risk by the available models. We therefore intended to identify 137 factors that drove that risk and focus selections on counties based on the distributions of factors 138 associated with higher SARS-CoV-2 percent positivity. Our models differed slightly with the 139 evolution in the pandemic, knowledge of SARS-CoV-2 risk factors, and features of the ADPH 140 testing data between October 2020 and March 2021. 141

October 2020 142
We used linear regression models with dispersion with the primary outcome of average 143 14-day SARS-CoV-2 percent positivity summarized at the county level from October 2 -144

March 2021 153
We used negative binomial regression modeling with dispersion to evaluate factors 154 associated with average 7-day SARS-CoV-2 percent positivity at the county level from 155 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 1, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 06/17/2020 -3/31/2021. Between October and March, we learned that for some months, the 14-156 day percent positivity result produced in the state data eliminated all individuals with prior 157 SARS-CoV-2 testing data from the denominator, thus inflating the 14-day percent positivity. 158 This did not occur with the 7-day summary value. Model covariates included factors associated 159 with increased risk of acquiring SARS-CoV-2, increased risk of COVID-19 disease severity, and 160 factors associated with less risk mitigation behavior as above. 161

162
In October 2020, we summarized past 14-day average SARS-CoV-2 percent positivity, 163 14-day case fatality, and 14-day cases/100k population. We also included a cross sectional 164 description of the number of testing sites per county based on collated data provided by ADPH. 165 We described the counties in the top quartile for percent positivity, case fatality, and case rates 166 and below the median for number of testing sites. We also conducted a factor analysis including 167 3-month percent SARS-CoV-2 positivity, SARS-CoV-2 case fatality rate, COVID-testing sites 168 per 100k population. One factor explained 41% of the variability and was used to provide an 169 additional ranking of counties for selection. 170 In March 2021, we summarized first quarter 2021 (01/01/2021-03/31/2021) 90-day 171 average case fatality, case rate, and overall mortality rate by county. We identified the counties in 172 the top quartile for percent positivity, case fatality, case rates, and mortality. 173 In both rounds of selections, we also described race [24,25], poverty [24,25], and 174 rurality [21,24] to inform selections. In the March 2021 selections, we also described vaccine 175 uptake, reported by ADPH. 176 Preliminary county selections 177 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2023. ; https://doi.org/10. 1101/2023 For both the October 2020 and March 2021 selections, the counties with the greatest COVID-19 178 burden as described above were shared with our Scientific and Community Advisory Boards to 179 further hone selections and avoid overlapping outreach with other efforts (e.g., ADPH focus 180 areas, other RADX-Up projects in the region) and to prioritize counties in need with known 181 partners to promote community outreach efforts. Final selections were made by the Scientific 182 and Community Advisory Boards in collaboration with the scientific leadership for this project. 183 184

185
In October 2020, we modeled the predictors of past 14-day percent SARS-CoV-2 positivity 186 and most recent county-level estimates for clinical factors, social determinants of health, and other 187 measures of behavioral risk mitigation strategies we expected to inform testing and case rates. No 188 covariates were significantly associated with county level percent positivity (Table 1). 189 is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2023. In March 2021, we modeled predictors of average 7-day case positivity from 6/18/2020 -191 3/31/2021 using similar covariates (Table 1). In this adjusted model, premature death rate was 192 associated with 16% higher SARS-CoV-2 percent positivity (aRR 1.16, 95% CI 1.07, 1.25) per 193 1,000 deaths, 10% greater proportion of county residents identifying as Black was associated 194 with a <1% higher in SARS-CoV-2 percent positivity (aRR 1.003, 95% CI 1.00, 1.01), and 195 number of preventable hospitalizations was associated with a slightly higher in SARS-CoV-2 196 percent positivity (aRR 1.03, 95% CI 1.00, 1.06) per 500 hospitalizations). A 10% higher 197 proportion of the county population identifying as smokers was associated with a 77% lower 198 percent positivity (aRR 0.231, 95% CI 0.10, 0.55). 199 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2023. ; https://doi.org/10. 1101/2023 Given the limited predictive power of the existing models and the models we built to 200 identify counties with the greatest need, our county selections were determined based on 201 descriptive epidemiology and input from scientific and community advisory boards. We 202 described the counties in the top quartile for percent positivity, case fatality and case rates and 203 below the median for number of testing sites and organized counties based on the number of 204 parameters for which they were in the top quarter of the 67 counties in the state (Table 2). Table  205 3 demonstrates the October 2020 summary of past 14-day SARS-CoV-2 percent positivity, 14-206 day case fatality, and 14-day cases/100k population. We included a cross sectional description 207 of the number of testing sites per county based on collated data provided by ADPH. 208 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2023. ; https://doi.org/10.1101/2023.01.31.23285248 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2023.  Factor scores provided a ranking of counties with respect to 3 months SARS-CoV-2 211 positivity, case fatality rate, and SARS-CoV-2 testing sites/100k population. The factor rankings 212 indicated in Table 3 can be interpreted as how much they differ from the average county on the 213 composite measure of SARS-CoV-2 test positivity, case fatality, and testing sites. For example, 214 Franklin County was estimated to be about 1.8 standard deviations worse compared to the 215 average. The factor analysis ranking is described in Table 3 and identified the top 10 counties  216 with the highest ranking based on the combined factor analysis. 217 CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2023. ; https://doi.org/10.1101/2023.01.31.23285248 doi: medRxiv preprint CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2023. ; https://doi.org/10.1101/2023.01.31.23285248 doi: medRxiv preprint Priority counties selected in both phases of the project based on the descriptive epidemiology and 225 mapped by the service areas of our community partners (Area Health Education Centers, AHEC) 226 who conducted the testing. 227 Given the ongoing lack of discrimination between counties in the models we again used 228 descriptive epidemiology and input from scientific and community advisory boards for the March 229 2021 selections. Table 4   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2023.  (2)  Here we present our approach to pragmatic identification of rural counties in need of a 245 SARS-CoV-2 testing intervention in a Southern U.S. state at the peak of the 2019-first quarter 246 2021 phase of the U.S. SARS-CoV-2 epidemic. In order to select counties for rapid deployment 247 of a testing intervention in a state with high testing need, we were tasked with quickly identifying 248 priority counties using publicly-available data. Based on global and national guidance, we focused 249 on SARS-CoV-2 percent positivity [17,22]. As models evaluating factors associated with percent 250 positivity did not discriminate counties for selection, we pivoted to analyses of descriptive 251 epidemiological data to inform county selection. This approach allowed us to categorize counties 252 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2023. ; https://doi.org/10.1101/2023.01.31.23285248 doi: medRxiv preprint based upon the count of four severity criteria with four and 12 of Alabama's 68 counties meeting 253 at least three of these criteria in October 2020 and March 2021, respectively. The presentation of 254 these data in tabular and geospatial fashion was easily interpreted by the community and scientific 255 advisory boards, and community testing implementation partners, with prioritization of counties 256 for testing further benefitting from an understanding of the local community context in identified 257 high-severity counties. Equipped with this information, as depicted in Fig 1, community testing  258 partners were able to prioritize rural counties for outreach and the identification of venues and 259 local partners for testing delivery [35]. As our data sources and analytic capacities expand in scope 260 and sophistication, leveraging traditional epidemiological data and analysis retains a vital role in 261 engaging communities to address communicable and non-communicable conditions in their local 262 context, particularly when response to emerging diseases when time for sophisticated analyses is 263 limited. 264 In October 2020, no factors were associated with percent positivity in our models. This 265 speaks to the fact that in these first waves of COVID-19 no population was immune, and testing 266 was roughly similar across the counties, so it is not surprising that specific factors driving 267 COVID transmission and positivity were not identified. In addition, more localized outbreaks 268 and "super-spreader" events were randomly distributed in time and geographic area and therefore 269 the time periods included in the models represent variable features of a dynamic epidemic. In 270 March 2021, factors associated with increases in percent positivity were county level rates of 271 premature death [24] and preventable hospitalizations[25, 31, 32]. In addition, having a higher 272 proportion of the population who identified as Black increased the likelihood of case positivity. 273 These associations may reflect that those in poorer resourced counties were at higher risk of 274 acquiring SARS-CoV-2 and also less likely to have access to or seek testing in the absence of 275 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 infection. In addition, an increase in the proportion of county residents who self-described as 276 smokers was associated with a decrease in the SARS-CoV-2 percent positivity. This may reflect 277 more screening on the part of smokers and those living with smokers who are at higher risk for 278 respiratory disease, in general, and possibly for SARS-CoV-2 complications [36][37][38]279 there are limits to this model given county level population-level associations which are broad 280 and subject to ecological fallacy. In both selection rounds, the models were not helpful in 281 informing key areas to focus outreach efforts and therefore, we pivoted to descriptive 282 epidemiology to identify areas most in need of testing with the assumption that the testing 283 intervention might improve testing, contact tracing and thus reduce disease spread, case fatality 284 and morbidity. 285 Identifying the appropriate metric to track in order to identify areas of testing demand 286 during a rapidly evolving pandemic proved to be challenging. Case positivity was originally 287 thought to be the best indicator of the next outbreak which would therefore inform aggressive 288 testing strategies [39]. This did not prove to be the case due to the often-asymptomatic nature of 289 SARS-CoV-2 and the overwhelming task of contract tracing in our strained public health setting 290 [40,41]. In addition, (a) errors in reporting, (b)  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2023. ; https://doi.org/10. 1101/2023 This team is led by HIV researchers who routinely wait years for public health data to be 299 cleaned and available, in part due to the sensitive nature of HIV testing and related data [44,45]. 300 The excitement of real-time data releases that are key to managing the COVID-19 pandemic was 301 tempered by frustrations when rapid data availability was associated with less reliable data. For 302 instance, Alabama datasets, like many other datasets, had noise related to large data dumps due 303 to institutions reporting testing data en masse (e.g. 2352 new cases reported on 3 March 2021 304 reflecting data from months prior) changing denominators, and, at times, incomplete data. Our 305 process highlights the need to proceed with caution and engage with community stakeholders 306 when using real-time, inherently messy data to inform public health interventions. 307 In recent years there has been a push for precision population health, leveraging big data 308 to identify those at greatest risk to optimize delivery of the right intervention, to the right 309 population, at the right time. Unlike diseases with more stable epidemiological and risk profiles, 310 the dynamic nature of SARS-CoV-2, over geography and time, proved challenging for a model-311 based approach to inform precision population health tailored SARS-CoV-2 testing. A simplified 312 approach based upon descriptive epidemiological data proved informative while allowing for 313 meaningful engagement of essential community stakeholders and implementation partners 314 ultimately charged with implementing testing in their local communities. As data sources and 315 analytic capacities expand in scope and sophistication, leveraging traditional epidemiological 316 data and analysis retains a vital role in engaging communities and building trust in data in order 317 to optimally address communicable and non-communicable conditions in their local context. 318 Limitations 319 This is a methods paper to share lessons learned in selecting counties for a SARS-CoV-2 320 testing intervention but not an in-depth exploration of SARS-CoV-2 testing patterns and 321 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 1, 2023. ; epidemiology in Alabama. The findings are unlikely to generalize to all settings and will be 322 most applicable to other counties or states with strained public health systems in the U.S. 323 healthcare context. Details regarding our testing intervention and findings are published 324 elsewhere [35]. 325 326

327
We present a pragmatic approach to inform an evolving, pandemic community-level 328 intervention. We also share lessons learned regarding the limits of test positivity in an outbreak 329 where testing uptake is poor and where case positivity rates generally exceeded a threshold level 330 to discern need in areas where nearly all counties had high need. Our current and future work 331 highlights the importance of community partnerships, local knowledge to inform testing outreach 332 and perhaps highlights the need for more centralized pandemic preparedness. Hall, PhD 2,3,6 , Larry Hearld, PhD 2,6 , Bertha Hidalgo, PhD, MPH 1,2,3 , Dione King, PhD, MSW 340 1,3,7 , Emily Levitan, PhD, MS 1,2,3 , Max Michael, MD 3 , Trisha Parekh, DO 2 , Donna Porter, PhD 341 1,2 , David Redden, PhD, MS 3 , Michael Saag, MD 2 , Bisaka "Pia" Sen, PhD, MA 2,3,6 , Barbara 342 Van Der Pol, PhD, MPH 2 , Jeffery Walker, PhD, MA 3,7 343 COVID COMET RADXUP Team Affiliations: 344 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 1, 2023. ; https://doi.org/10. 1101/2023