The Estimation and Projection Package Age-Sex Model and the r-hybrid model: new tools for estimating HIV incidence trends in sub-Saharan Africa

Supplemental Digital Content is available in the text


S1. Technical details of the EPP-ASM model
The role of the EPP model in constructing national HIV estimates and projections is to estimate the adult HIV incidence trend from available HIV survey and surveillance data. The estimated HIV incidence trend is provided to the Spectrum model, which uses the HIV incidence trend to calculate HIV epidemic and impact indicators such as PLHIV, HIV prevalence, antiretroviral treatment coverage, AIDS deaths, mother-to-child HIV transmission, paediatric HIV outcomes, and AIDS orphanhood. Ensuring consistency of outputs from the Spectrum with the data to which EPP is calibrated requires consistent model structure and assumptions that dictate relationships between HIV incidence, prevalence, and AIDS mortality by sex and age. To achieve this, the EPP-ASM model represents the adult population aged 15 years and older by sex, single-year of age, and HIV status, and mirrors the model structure and assumptions of the Spectrum model: • The adult population is stratified by single-year of age and sex between ages 15 to 79 and 80+.
• The population on antiretroviral treatment stratified by CD4 stage at treatment initiation, and three treatment duration groups: 0-5 months, 6-11 months, and 12+ months.
A key design principle for EPP-ASM was to balance precise representation of the demographic and epidemiologic processes with computational efficiency to enable hundreds of thousands of model simulations during Bayesian model calibration. Representing the full stratification of the Spectrum model (single-year age groups, two sexes, eight stages of infection, and four treatment stages) and simulating transitions between all of these states with 0.1-year time step was not computationally practical. We also found that an alternative approach of approximating ageing as a Markov transition through 5-year age groups did not provide sufficiently accurate representation of ageing.
To address this challenge, we developed a mixed approach in which populations are tracked with different levels of stratification and model processes simulated on different time steps This structure implicitly assumes a homogenous distribution of the HIV population by CD4 category and ART duration within the coarser age group, while the exact size of the total HIV-positive population is tracked by single-age year. We found this provided very accurate representation of the Spectrum model, with the addition of the stratification of the 15-19 age group into 15-16 and 17-19 because HIV incidence changes rapidly within this age range, resulting in a rapidly changing CD4 distribution amongst those HIV-positive. The following demographic input parameters are taken from Spectrum as annual inputs: • Initial population size by age and sex in the projection start year, typically 1970; • Probability of survival ( ) from age to age + 1 from causes of death other than HIV; • Age-specific fertility rate; • Male-to-female sex ratio at birth; • Number of net migrants by age and sex or the 'target' mid-year population size by age and sex in each year.
If a target population is specified rather than the number of net migrants, at the end of each annual projection step (after simulating HIV processes), the population is scaled in each age/sex compartment to match the target population. This option is used as default for EPP regional stratifications (urban/rural or other subnational) in which demographic inputs and migration are challenging to estimate, but population size and distribution are available.
Since the model represents only the population aged 15 and older, the number of births by sex is calculated each year and stored as a lagged input to the model 15 years later, reduced by the calculated probability of survival from birth to age 15 for each cohort.
Modelling paediatric HIV, including mother-to-child HIV transmission, paediatric HIV progression and survival, and effects of paediatric ART on survival incurs substantial model complexity, but only modestly affects adult HIV inference. Rather than fully simulating these processes, the model uses the HIV prevalence among age 15 entrants and their distribution across CD4 and ART stages as fixed model inputs from a previous Spectrum simulation.

HIV incidence by age and sex
New HIV infections are calculated at every 0. After calculating the HIV incidence rate in each time-step, the number of new adult infections are allocated by sex and single-year of age. The female-to-male incidence rate ratio and age-specific incidence rate ratio relative to age 25-29 years by sex in each year are taken as fixed inputs from Spectrum for ages 15-19 through 75-79, with no new infections assumed in the 80+ age group. Incidence rate ratios by five-year age group are disaggregated to single-year using Beer's graduation coefficients [1], following Spectrum.

HIV disease progression, AIDS mortality, and antiretroviral treatment
The EPP-ASM model takes the following inputs from Spectrum for modelling HIV natural history and the impacts of ART on survival: • Initial CD4 distribution following seroconversion by age and sex; • Annual rate of progression to next CD4 stage by age and sex; • AIDS mortality rate by CD4 stage, age, and sex; • AIDS mortality rate for those on ART by age, sex, CD4 at ART initiation, treatment duration, and calendar year; • Number or percentage of ART eligible adults on treatment at the end of each calendar year.
• CD4 threshold for ART eligibility in each calendar year; • Percentage eligible for treatment due to other eligibility criteria (e.g. serodiscordant couples, TB infection, key population groups) by sex and year; • optionally, the ART drop-out rate; and • optionally, the median CD4 count at ART initiation in years for which it is known.
Disease progression, AIDS deaths, and ART initiations are calculated every 0.1 year timestep within the nine coarse age groups. At the end of this calculation, the number of AIDS deaths to the HIV positive population must be removed from the HIV positive population by single year of age ( Figure S1; left) such that the total HIV positive population remains exactly aligned. Deaths are removed proportionally from the single-year age groups within each coarse age group proportionally to the distribution of the single-age HIV positive population within each coarse age group. Figure S2 and Figure S3 illustrate examples of sex and age stratified outputs from the EPP-ASM model applied to Malawi Central Region (shown in the second row of Figure 1). Figure   S2 shows time trends in HIV prevalence, HIV incidence rate, and AIDS mortality rate by age groups 15-24, 25-34, 35-49, and 50+ years. This illustrates the characteristic diverging trends in HIV prevalence by age of declining HIV prevalence amongst the young adults as incidence declines while it increases rapidly amongst older adults due to longer survival following ART scale up. Figure S3 shows estimates of HIV prevalence, HIV incidence rate, and AIDS mortality rate by single year of age for ages 15 to 64 for the year 1995, 2005, and 2015, illustrating the older age profile of HIV among men compared to women and again the increasing age of peak HIV prevalence as the epidemic matures.

S2. Technical details of the random walk component of the r-hybrid model
From the mid-2000s, the r-hybrid model uses a piecewise-linear spline with a first-order random-walk (RW1) penalty on the spline coefficients to model changes in log ( ).
The piecewise-linear spline is defined on a sequence of knots 0 , … , evenly spaced We found that knot spacings every Δ = {1, 2, 3, and 5} years made negligible difference to posterior estimates, projections, and uncertainty ranges for HIV transmission rate, incidence, or prevalence, but that longer knot spacings substantially improved parameter identifiability and model convergence. Figure S4 illustrates results of the r-hybrid model fitted with Δ = 1, Figure 1. The posterior mean and 95% CI ranges are nearly indistinguishable for the different knot spacing choices. These results are summarized for all 177 EPP regions in Figure S5 showing posterior mean estimates and Figure S6 showing posterior standard deviation. Figure S5 shows that posterior mean estimates are virtually identical when using annual knot spacing versus knots every 5 years. In Figure S6, the posterior uncertainty is only very slightly larger when using annual knots compared to knots every 5 years. Figure S7 and Table S1 summarise the median number of iterations required to achieve convergence of the IMIS algorithm. This steadily declines with the knot space, with the most dramatic reduction from Δ = 1 to Δ = 2. With Δ = 2 to Δ = 5, the number of iterations was lower than required for convergence of the r-spline model, and comparable to the r-trend model.

3, and 5 years fitted to the four EPP regions shown in
Based on these results, we recommended the default Δ = 5, requiring = 5 parameters for a spline spanning a projection from 2003 through 2025. In total, the r-spline model requires eight parameters (seven spline coefficients and one variance parameter) to specify ( ) compared to nine for the r-hybrid model (four logistic function parameters and five random walk coefficients). The current EPP implementation allows different knot spacing to be data become available to identify more frequent fluctuations in the HIV transmission rate. Figure S4: Outputs from the r-hybrid model using random walk knot spacings Δ every 5, 3, and 1 years. Results show estimates and 95% credible intervals for trends in HIV prevalence (left), HIV incidence rate per 1000 (center), and log r(t) (right) for the same example regions presented in Figure  1. Estimates and uncertainty range bounds are nearly indistinguishable indicating that results are insensitive to knot spacings ranging from annual to every 5 years.  The transition from logistic function to random walk begins in year 2003. For HIV prevalence and HIV incidence rate, the relative standard error (standard error divided by mean) are plotted. Figure S7: Number of IMIS iterations to achieve posterior convergence for random-walk knot spacings ranging from annual (dk=1) to every 5 years (dk =5) for model fits to 177 EPP regions. For comparison, the number of IMIS iterations for convergence of the r-spline and r-trend models are presented. All model fits used  For instances in which both HIV prevalence and incidence were measured in the same survey, we updated the previously described statistical model [2] with a new likelihood approximation that accounts for uncertainty about test for recent infection characteristics (mean duration of recent infection and false recent ratio) and complex survey design, and allows the user to input the final incidence estimate and standard error, rather than full details of the incidence rate calculation. Let {̂, , ĥ , } be the estimated prevalence and incidence rate, respectively, with covariance matrix where ̂, is the covariance of the HIV prevalence and incidence estimates arising because the formula for estimating incidence ĥ , from recent infection status depends on the prevalence ̂, [3] and clustered survey design. We modelled the observed probittransformed prevalence and log-transformed incidence as a bivariate normal distribution Alkema, Raftery, and Clark [4] with estimated additional non-sampling error variance described by Eaton and Bao [5]. The statistical models for ANC prevalence from ANC-RT at the site-level or census level are described by Sheng et al. [6]. The key difference in our formulation is that rather than modelling ANC prevalence as a function of general adult population prevalence 15-49 ( ), ANC prevalence is related to the HIV prevalence among pregnant women preg ( ) predicted by the EPP-ASM model accounting for age-specific fertility, age-specific HIV prevalence among women, and the relative fertility of HIV positive women by age, CD4 stage, and ART status.