Estimating Local Chlamydia Incidence and Prevalence Using Surveillance Data

Supplemental Digital Content is available in the text.


A model for chlamydia surveillance data
We propose a three-compartment model of chlamydia infection, testing and screening in a closed population, as illustrated below. Uninfected individuals (U) become infected with a constant incidence, and move to either the asymptomatic-infected (A) or symptomatic-infected (S) pool. Asymptomatic-infected individuals may leave A and return to U by spontaneous clearance of their infection or by detection and treatment under a screening programme. Symptomatic individuals may similarly be screened, but will also seek treatment at a rate which is typically much higher than the rates of spontaneous clearance or screening.

Steady-state assumption
In using the model to interpret testing and diagnosis data, we assume the system is at steady state. We investigate to what extent this assumption is valid by perturbing the system and observing the return to equilibrium. First, we use the national coverage and diagnoses per capita in men in the years 2012 -2015. The analysis proceeds as follows:  ([2012,2013,2014,2015], ['2012','2013','2014','2015 The plot shows that prevalence was very close to the steady state, with differences being very small compared to the uncertainty in prevalence estimates illustrated in the Figures in the main text.
To investigate the validity of the steady state assumption at a local level, we identified the local authorities with the largest changes in prevalence between 2012 and 2013:   At local level changes in prevalence can be more pronounced than at national level, but even with the largest changes in prevalence the new steady state is reached after much less than a year.

Different testing rates in different populations
We also investigate the sensitivity of the model to different testing rates in subpoulations with different prevalences. This analysis makes use of results reported in Woodhall Sex. Transm. Infect. 92:21-227 (2016) for the proportion of 16-24-year-old men in Natsal-3 reporting different risk behaviours and chlamydia testing and diagnosis in the last year.
Taking each risk factor in turn, we estimate prevalence for each risk level and take the weighted average as an estimate of population prevalence. We also estimate prevalence from the proportion tested and and diagnosed in the whole population, for comparison. plt.plot ([prev,prev],[0,100]) Figure 6: Sensitivity of prevalence estimates in 16-24-year-old men to risk-dependent differences in testing. Hollow markers: risk-level-specific estimates. Marker area is proportional to the proportion of the population in each risk category. Solid markers: weighted mean of level-specific estimates. Vertical line: estimate using aggregated proportions tested and diagnosed. In this figure, hollow markers show risk-level-specific prevalence estimates and their area represents the proportion of the population in each risk group. Large markers show the weighted average of these levelspecific estimates. The solid line shows prevalence estimated from aggregated testing and diagnosis (ie. not stratified by risk). It should be emphasised that due to limitations of the data, the analysis is intended as an illustration of the model's theoretical properties rather than an accurate estimate of prevalence in the different risk categories. The data from Natsal-3 is some of the best available, but nonetheless relies on participants' recall and accurate self-reporting. It was collected at a national level, and equivalent information is not available at a local level for incorporation into local-level prevalence estimates.
Although aggregating across the population does affect prevalence estimates, the differences are small compared with the 1-2% uncertainty which we found in our analyses of the surveillance data.

In [ ]:
Estimating local chlamydia incidence and prevalence using surveillance data: eAppendix 2 Data on sexual behaviour from the third National Study of Sexual Attitudes and Lifestyles (Natsal-3) are available from the UK data service: https://www.ukdataservice.ac.uk/ (downloaded 23 September 2015). These were used to infer 95% confidence intervals for the proportions of men and women, aged 16-19 and 20-24, who were sexually active (see the accompanying R script; note that no 15-year-olds were recruited to Natsal-3). These 95% confidence intervals were in turn used to derive beta-distribution priors for the proportion sexually active within each sex and age group.

Sampling for testing and diagnosis rates
In [1]: import numpy as np from numpy import * from scipy.stats import beta from scipy.optimize import fsolve ###################### # parameters of beta distributions representing the proportion of the population sexually Next, sample from distributions for the probability of being sexually active, the size of the sexually active population and the testing and diagnosis rates per person per year.
The right-hand panel shows the probability density of the time between onset of symptoms and attending the GUM clinic where patients were surveyed, to 42 days (solid blue line). The blue shaded area and dashed line show the central 95% and median of simulated histograms for waiting times to clinic, with bins corresponding to time windows reported in the data. The last bin contains all times longer than six weeks and has been divided by 10 (as opposed to the width of the window) to make it readable. For comparison, red error bars show the reported proportions of patients with treatment-seeking times within each time window (estimate and 95% CI), normalised to be on the same scale as the predictions (blue). The good predictive properties of the model are indicated by the agreement between the data, in red, and the posterior predictions in blue.

Rate of spontaneous clearance of infection
Rates of spontaneous clearance of infection in men and women were sampled using MCMC and the STAN software (see accompanying R scripts, STAN model files and references), following the model presented by

Estimating national prevalence
The sampled parameter values are now used to infer prevalence in men and women in different age groups. In these plots, step histograms show the sampled values for prevalence in men and women, by age group. The horizontal bars give 95% confidence intervals for prevalence in comparable age groups, estimated from Natsal-3. They show the agreement between our surveillance-based method and the population-based survey. Stepped histograms show samples. Horizontal bars give 95% confidence intervals for prevalence in comparable age groups, estimated from Natsal-3.

Symptomatic and asymptomatic diagnoses
Although the data does not report the number of diagnoses that were in symptomatic and asymptomatic cases, we can propose different possible numbers of symptomatic and asymptomatic diagnoses and examine the inferences which would have followed in each case.  The dashed lines are intended as a guide to the eye, to indicate scenarios roughly compatible with the Natsal-3 prevalence estimates. The observed chlamydia prevalence in Natsal-3 would be consistent with around 60-80% of diagnoses in men and 45-70% in women being symptomatic.   1 Local differences in chlamydia prevalence, proportion diagnosed and positivity In this example, we use local numbers of chlamydia tests and diagnoses recorded during 2012 to investigate local differences in incidence, prevalence and screening in men and women.
In [1]: # This script also contains the functions linking observed tests, symptomatic/asymptomatic/toal diagnoses, # incidence, prevalence, screening and other model parameters # Running it takes a little while because of all the symbolic algebra %run -i test_diag_fun.py # This script provides a function for calculating the likelihood of categorical data. %run -i multinomial_pmf.py # This script samples model parameters from prior distributions, following the method in england.ipynb. %run -i sample_parameters.py Surveillance data on chlamydia testing and diagnosis rates by English local authority (LA) in 2012 were downloaded from: http://www.chlamydiascreening.nhs.uk/ps/data.asp (downloaded 9 February 2016). Numbers of tests and diagnoses were copied into the csv file included with this notebook.

Testing and diagnosis rates
Samples for the testing and diagnosis rates for 16-24-year-old men and women in each LA were generated from gamma distributions based on the data. We now examine the correlation between local proportions tested and diagnosed, for men and women separately. Plotting the proportion of the sexually active population tested for chlamydia against the proportion diagnosed shows clearly the correlation between the two: as more tests are conducted, more infections are discovered. In these (and all subsequent) plots, markers show the median of the sampled distributions, and error bars the 2.5th and 97.5th centiles.

Positivity and prevalence
Using the sampled proportions tested and diagnosed, we now calculate prevalence in men and women in each LA and then examine the correlation between observed positivity and our estimated prevalence.  Although there is a positive correlation between prevalence and positivity, positivity is consistently higher because the sample of individuals tested is enriched with infected individuals seeking treatment because of symptoms. There are also a large number of possible pairs of local authorities in which the authority with the lower positivity has the higher prevalence.
The confidence intervals on the positivity and prevalence estimates are wide, but much of this uncertainty stems from weak information on the model's natural history parameters. To understand the correlation better, we estimate Spearman's rho separately for each multivariate sample of model parameters, testing and diagnosis rates:   For the samples drawn, the correlation between prevalence and positivity (measured by Spearman's ρ) was always positive and statistically significant (p < 0.05). However, the correlations -especially for women -were sometimes weak (see histograms).

Local differences in prevalence
We now use our samples to compare prevalence by local authority.
In  (122) order_m = argsort(percentile (prev_m_la,50,axis=0)) # order by prevalence in men # Comment-out the next line to plot all LAs. You will also need to adjust axis sizes. order_m = order_m [append(range(0,5),range(146,151))] ax1.errorbar( y = range(len(order_m)), x = (percentile(prev_m_la,50,axis=0))[order_m], xerr=array ([percentile(prev_m_la[:,order_m] In general, the 95% credible intervals for the highest and lowest LAs do not overlap at all, or only slightly. However there is a large group of over 100 LAs with intermediate prevalence, each with a confidence interval overlapping with all the others in the group. (A plot showing all LAs can be obtained by commenting-out the lines indicated above.) Although there are local differences in prevalence, they are generally small compared with the uncertainty in our estimates. Only in the most extreme cases can differences be clearly resolved. However, the rank order of LAs is robust: we examine consistency in rank order below.
In  The samples for each LA form a band -indicating that rank of prevalence is largely preserved across samples. The fact that the bands overlap shows that there is some swapping of rank order -this is due to uncertainty in the rate of testing and diagnosis. The y-range over which the band moves as it goes from left to right is at least as great as the thickness of the band itself, showing that uncertainty in the model parameters in Table 2 contributes at least as much variation in the final sample as does uncertainty in the testing and diagnosis rates. Improving estimates of natural history and behaviour parameters would improve prevalence estimates.
Another approach to examing the same question is shown below: In   (151) This time one column represents one LA, ordered by median sampled prevalence (lowest to highest). Each column is filled according to how many times out of 10000 samples the LA fell into the lowest, second, third, fourth or highest quintile for prevalence. (Adjust the first line of this code block to choose the number of quantiles used.) Samples for the lowest-and highest-prevalence LAs are almost always in the lowest and highest quintiles, respectively, whilst LAs with prevalence estimates in the middle of the range are more likely to be found in two or sometimes three quintiles. There is again a clear order of prevalence which is generally preserved regardless of the particular sampled model parameters.

Prevalence and incidence
Finally, we plot incidence in each sex against prevalence in the other to examine the effect of infection levels in men on the rate of new infections in women, and vice versa.  [15.6, 11.1, 4.9, 11.2, 10.8, 7.7, 7.6, 4.3, 2.9, 9.8, 10.7, 7.5 Orange indicates the relationship between prevalence in men and incidence in women, and green shows the relationship between prevalence in women and incidence in men. Hollow circles highlight inner London boroughs, with circle area proportional to the percentage of the male population aged 16-44 years estimated to be MSM (Ruf et al., Int J. STD AIDS 22:25-29;2011).
An natural question is why some LAs have higher incidence and prevalence than others. One possibility is that higher screening rates in some areas lower prevalence and incidence. To investigate this, we plot Figure 13: Sampled incidence vs. screening rate. Markers and error bars indicate median samples and central 95% credible intervals. Top-left: incidence in men vs. screening in men; top-right: incidence in men vs. screening in women; bottom-left: incidence in women vs. screening in men; bottom-right: incidence in women vs. screening in women. Figure 14: Spearman correlations between screening and incidence, at each of 10000 samples. Blue: incidence in men vs. screening in men; green: incidence in men vs. screening in women; orange: incidence in women vs. screening in men; red: incidence in women vs. screening in women.  Prevalence in men and women is positively correlated, because of the incidence-prevalence relationship illustrated above. LAs with more non-symptomatic screening of men also tend to have more screening of women, but all LAs have more screening in women than men.
In [ ]: Figure 17: Correlation in local prevalence (left) and screening (right) in men vs. women. Markers and error bars indicate median samples and central 95% credible intervals.