The seasonality of cholera in sub-Saharan Africa: a statistical modelling study

Summary Background Cholera remains a major threat in sub-Saharan Africa (SSA), where some of the highest case-fatality rates are reported. Knowing in what months and where cholera tends to occur across the continent could aid in improving efforts to eliminate cholera as a public health concern. However, largely due to the absence of unified large-scale datasets, no continent-wide estimates exist. In this study, we aimed to estimate cholera seasonality across SSA and explore the correlation between hydroclimatic variables and cholera seasonality. Methods Using the global cholera database of the Global Task Force on Cholera Control, we developed statistical models to synthesise data across spatial and temporal scales to infer the seasonality of excess (defined as incidence higher than the 2010–16 mean incidence rate) suspected cholera occurrence in SSA. We developed a Bayesian statistical model to infer the monthly risk of excess cholera at the first and second administrative levels. Seasonality patterns were then grouped into spatial clusters. Finally, we studied the association between seasonality estimates and hydroclimatic variables (mean monthly fraction of area flooded, mean monthly air temperature, and cumulative monthly precipitation). Findings 24 (71%) of the 34 countries studied had seasonal patterns of excess cholera risk, corresponding to approximately 86% of the SSA population. 12 (50%) of these 24 countries also had subnational differences in seasonality patterns, with strong differences in seasonality strength between regions. Seasonality patterns clustered into two macroregions (west Africa and the Sahel vs eastern and southern Africa), which were composed of subregional clusters with varying degrees of seasonality. Exploratory association analysis found most consistent and positive correlations between cholera seasonality and precipitation and, to a lesser extent, between cholera seasonality and temperature and flooding. Interpretation Widespread cholera seasonality in SSA offers opportunities for intervention planning. Further studies are needed to study the association between cholera and climate. Funding US National Aeronautics and Space Administration Applied Sciences Program and the Bill & Melinda Gates Foundation.

: Map of cholera incidence data availability in sub-Saharan Africa. Availability is expressed in terms of the number of distinct years for which at least one observation is available, partitioned by GADM administrative level and obervevation length. Light gray indicate countries that were excluded due to lack of data.

S2.1 Base model
Cholera occurrence is modeled as binary observation process at a monthly temporal resolution. The analysis is done on a per-country basis taking the administrative subdivisions as the spatial units of the disease process. The aim is to draw on information on all administrative levels by exploiting the nested structure of the administrative units, for instance districts within provinces within a country.

S3
The data consists of observations of cholera occurrence per month. Assuming that the administrative level 2 is the lowest administrative level at which cholera is observed, the set of n i,m,y monthly observations for month m in administrative level 2 unit i of year y, Y i,m,y , is modeled as resulting from a binomial distribution: Y i,m,y ∼ Binomial (n i,m,y , p i,m,y where p i,m,y is the combined probability of there being cholera and observing it (the process and measurement models are combined). The monthly probabilities are assumed to follow the BYM model: logit (p i,m,y where β m is the monthly country-level cholera probability of month m, η y is a year-specific random effect, and θ i and φ i are the non-spatial and spatial random effects. We assume that the βs follow a 1-dimensional cyclic random walk, i.e. β m − β m−1 ∼ N (0, σ β ) ∀m ∈ [2, 12] and β 1 − β 12 ∼ N (0, σ β ) to impose the cyclic pattern. To allow the identifiability of the model we further impose 12 m=1 β m = 0. We assume that the spatial random effects follow the ICAR model, and use the reparameterized version of the BYM model (Riebler et al. 2016): where τ is the overall marginal precision (i.e. the total residual relative risk), θ * ∼ N (0, 1), φ ∼ ICAR, s is a scaling factor that depends on the adjacency matrix of the neighborhood network to ensure V ar(φ * ) ≈ 1, and ρ ∈ [0, 1] indicates whether the marginal variance comes from spatial or non-spatial random effects.
At the upper administrative level the observation process at the admin unit level 1 j is given by the probabilities of cholera reporting in all areas at the level 2 contained in it, i ∼ j, as: where p i,m,y is the probability of cholera reported at the lower administrative level modified by a reporting modifier ξ l at administrative level l. For instance, in the case models were run at administrative level 2, a different effect modifier was applied to the first administrative level and country-wide observations. The country-level monthly observations follow the same observation process with the product over all spatial subunits.
We also account for multi-month observations. The probability of observing cholera in admin unit i in year y for period T n = {m 1 , . . . , m n } composed of n months is given by: Similarly multi-month observations at any upper admin level j is obtained by multiplying the individual probabilities:

S2.2 Offset model
The base model considers yearly random effects based on the calendar year, which may be unrealistic in some situations. For instance if cholera has a single peak in December/January, calendar-based random effects will apply only to the first "half" of the epidemic with the second half falling in the following calendar year.
To account for this fact we introduce and offset in the assignment of the yearly random effect based on the month in order to align the yearly random effects to "cholera" years by introducing a latent discrete variable representing the offset. The probability of occurrence becomes: yr, m, z) gives the year corresponding to the month accounting for the offset.
For the purpose of sampling z can be marginalized out of the likelihood as: where Θ is the vector of all parameters defining the occurrence probabilities (β, η, ρ, τ, φ, θ, ξ), and subscript j indicates the observation (combination of admin level, month and year).
And the log-likelihood as: where ll z are the individual sum of marginal log-likelihoods and the log of the prior on z:

S2.3 Mixture model
Assuming the possible coexistence of two distinct groups of seasonalities within the same country, a latent class model for the probabilities of observing cholera can be written in terms of a mixture of two different vectors of seasonality coefficients β g1,g1 : representing the mixture probabilities, with priors λ ∼ beta(0.5, 0.5). To mitigate label switching issues we impose the constraints β g2 1 > β g1 1 and β g2 8 < β g1 8 . We allow for spatial autocorrelation in mixture coefficients by implementing an approach similar to that used for spatial random effects above: with parameter definitions as for the spatial random effects.

S2.4 Priors
We use the following priors: where τ β = 1/σ 2 β is the precision of the cyclic random walk of seasonality coefficients. Priors for the spatial random effects were described above. Penalized-complexity (PC) priors are chosen for on τ and φ following (Riebler et al. 2016 set to represent our belief in the importance of marginal variance. Similarly a PC prior for ρ can be formulated in terms of the probability P (ρ < U ρ ) = α ρ , which does not have a closed-form expression but for which tabulated values can be computed. We set U τ = 1 and α τ = 0.75, and U ρ = .5 and α ρ = 2/3 following Riebler et al. (2016). The same priors were chosen for spatially autocorrelated mixture probabilities for the mixture models.     Figure S2: Impact of priors. For each country the impact of priors on seasonality-related parameters was evaluated in terms of the shrinkage of variance from the prior to the posterior, computed as 1var(posterior)/var(prior). Figure S3: Model evaluation. For each country the performance of the selected model is given as the fraction of observations covered by posterior retrodictions as a function of the credible interval width. The coverage fraction is given for all observations as well as partitioned by administrative level. S10 Figure S4: Country-level cholera sesaonality time series. Excess cholera seasonality is shown in for the final selection of models in terms of the mean (line) and 95% CrI odds ratios. Figure S5: Uncertainty in seasonality coefficients in terms of the width of the 95 % CrI on the logit scale. S12 Figure S6: Uncertainty in seasonality index in terms of the posterior probability that the index is larger than the null hypothesis of equal occurrence probability during the year. Adminstrative units for which the posterior probability is above 0.9 are outlined in black. Figure S7: Seasonality strength and administrative unit characteristics. Each point corresponds to one administrative unit, lines correspond to GAM estimates of the relationship betwen the characteristics and the seasonality indices. S13 Figure S8: Seasonality peak map. Map of the month with largest estimated excess cholera occurrence odds ratio in countries for which a model with seasonality was selected. Figure S9: Offset map. Map of the month determining the start of the 'cholera year' in coutries for which a model with offset was selected. Countries with two colors correspond to models with both mixture and offset. S14 S6 Seasonality grouping Figure S10: Model comparison of number of groupings. Model comparison was done in terms of the estimated log-predictive density (elpd). Pairwise differences in elpd were computed for all model combinations, and results are shown in terms of the mean (points) and 2 standard errors (lines). The model with highest elpd was the model with 5 groups. S15 Figure S11: Clustering results for two to seven clusters. S16 Figure S12: Importance of correlations with hydro-climatic variables. Importance is given in terms of the fraction of area, administrative units, and population with the absolute value of Spearman correlation above thresholds between 0.5 and 0.9. Figure S13: Correlation between excess cholera seasonality and hydro-climatic variables. Correlation is expressed in terms of Spearman's coefficient, with areas with significant correlations outlined in balck.

S8 Alternative cholera occurrence definitions
Figure S14: Comparison of model selection for alternative definitions of cholera occurrence S19