Estimating unobserved SARS-CoV-2 infections in the United States

Significance In early 2020, delays in availability of diagnostic testing for COVID-19 prompted questions about the extent of unobserved community transmission in the United States. We quantified unobserved infections in the United States during this time using a stochastic transmission model. Although precision of our estimates is limited, we conclude that many more thousands of people were infected than were reported as cases by the time a national emergency was declared and that fewer than 10% of locally acquired, symptomatic infections in the United States may have been detected over a period of a month. This gap in surveillance during a critical phase of the epidemic resulted in a large, unobserved reservoir of infection in the United States by early March.


Sensitivity analysis of unknown parameters
Estimates of the proportion of imported symptomatic infections that were detected, ρtravel, and the infectiousness of asymptomatic infections relative to symptomatic infections, , varied based on the values of the other parameters. In general, higher values for parameters expected to increase or accelerate transmission (e.g., a shorter serial interval) were associated with higher estimates of ρtravel (Table S1). For a shorter mean serial interval of 4.7 days, the estimate was ρtravel = 0.50 (95% PPI: 0.05 -0.97), and with a longer mean serial interval of 7.5 days, the estimate was 0.13 (95% PPI: 0.01 -0.79). The estimated value of ρtravel was also lower if the CFR was low (ρtravel = 0.40, 95% PPI: 0.03 -0.96), compared to the scenario with a higher CFR (ρtravel = 0.55, 95% PPI: 0.06 -0.98). Higher ρtravel estimates correspond to fewer undetected imported infections; therefore, fewer undetected importations are required to account for the observed number of local deaths through March 12 if the CFR is high or the serial interval is short. In addition, when we based the timing of importations on international incidence (excluding China after travel restrictions were implemented on February 3) the estimate of ρtravel was 0.56 (95% PPI: 0.10 -0.97) due to the increased probability of early importations -and more time for local infections to increase -under this scenario. In most scenarios, the estimates of ρtravel and were positively correlated (Fig. S4).

Sensitivity analysis of cumulative infections
Because ρtravel and were estimated for each parameter-sensitivity scenario, cumulative infections were relatively similar under the low, baseline, and high scenarios for nearly all parameters. Cumulative infections were most sensitive to assumptions about the serial interval ( Fig. S5, Table S2). This affects how quickly local infections increase. Cumulative infections were also somewhat sensitive to assumptions about case fatality risk, because assumptions about that parameter influenced estimates of ρtravel and , which were based on reported deaths.

Sensitivity analysis of local case detection probability
The proportion of symptomatic infections detected over time followed a similar pattern under all parameter sensitivity scenarios, with low values of ρlocal(t) throughout late February followed by increases in March (Fig. S7).

Sensitivity analysis of the ratio of deaths after and before March 12
The ratio of deaths expected March 13 and after, relative to before then, was higher with changes in parameters that resulted in faster growth in local infections (e.g., shorter serial intervals) (Fig.  S8, Table S3). The proportion of deaths expected to occur after March 12 also increased with shorter reporting delays (Table S3). Overdispersion (lower k) did not drastically alter our estimates of ρtravel or (Table S1) or the number of cumulative infections (Table S2), but it did extend the lower and upper bounds on the range of the ratio of deaths after and before March 12 (Table S3). Figure S1. Distribution of the delay between symptom onset and reporting for 26 US cases. The curve shows the maximum-likelihood fit of a gamma distribution (shape = 3.43, rate = 0.572) to those data. Figure S2. Comparison of daily estimates of ρlocal(t) with and without smoothing in the baseline analysis. The two panels compare A) raw estimates with no smoothing and B) smoothed estimates with splines. We used the smoothed estimates in our analysis given that they are more indicative of general trends in case detection. The black line shows the median, dark gray shading shows the interquartile range, and light gray shading shows the 95% posterior predictive interval. Figure S3. Assumptions about timing of imported infections. Imported cases that have been reported are shown in gray, and the red line shows the baseline distribution of timing of imported infections that we based on a Gaussian kernel smooth of those data. The blue line shows an alternative distribution of timing of imported infections based on patterns of international incidence. Figure S4. Samples (10 4 ) from the joint posterior distribution of the proportion of imported symptomatic infections detected (ρtravel) and the relative infectiousness of asymptomatic infections (α) under different parameter-sensitivity scenarios. Parameter values for these scenarios are shown in Table S1 (for ρtravel and α) and Table 1 (for all other parameters). Figure S4 (continued). Samples (10 4 ) from the joint posterior distribution of the proportion of imported symptomatic infections detected (ρtravel) and the relative infectiousness of asymptomatic infections (α) under different parameter-sensitivity scenarios. Parameter values for these scenarios are shown in Table S1 (for ρtravel and α) and Table 1 (for all other parameters). Figure S5. Posterior predictive distributions of cumulative infections by March 12 under different parameter sensitivity scenarios. Unlike other parameters, importation timing was not described in terms of simple numerical values; in that case, "mid" refers to our baseline assumption that the timing of unobserved imported infections followed the timing of observed imported cases, and "high" refers to the alternative scenario that their timing followed international incidence patterns. Parameter values for these scenarios are shown in Table S1 (for ρtravel and α) and Table 1 (for all other parameters).

Figure S6. A-C) Posterior predictive distributions of cumulative infections by March 12 from the baseline analysis, with each panel reflecting a fixed value of R:
A) 1.9, B) 2.7, C) 3.9. D-F) Median (black), interquartile range (dark gray), and 95% posterior predictive interval (light gray) of the probability of detecting a local symptomatic infection, ρlocal(t), from the baseline analysis, with each panel reflecting a fixed value of R: D) 1.9, E) 2.7, F) 3.9. For the other parameters, 1,000 draws were made from the uncertainty distributions corresponding to the baseline analysis in Table S1 (for ρtravel and α) and Table 1 (for all other parameters). Figure S7. Median (black) and 95% posterior predictive interval (gray) of the probability of detecting a local symptomatic infection, ρlocal(t), after accounting for delays in reporting. Each panel represents a different parameter-sensitivity scenario. Parameter values for these scenarios are shown in Table S1 (for ρtravel and α) and Table 1 (for all other parameters). Figure S7 (continued). Median (black) and 95% posterior predictive interval (gray) of the probability of detecting a local symptomatic infection, ρlocal(t), after accounting for delays in reporting. Each panel represents a different parameter-sensitivity scenario. Parameter values for these scenarios are shown in Table S1 (for ρtravel and α) and Table 1 (for all other parameters). Figure S8. Posterior predictive distributions of the ratio of deaths after and before March 12 under different parameter sensitivity scenarios. Unlike other parameters, importation timing was not described in terms of simple numerical values; in that case, "mid" refers to our baseline assumption that the timing of unobserved imported infections followed the timing of observed imported cases, and "high" refers to the alternative scenario that their timing followed international incidence patterns. Parameter values for these scenarios are shown in Table S1 (for ρtravel and α) and Table 1 (for all other parameters).