# Characterizing an outbreak of vancomycin-resistant enterococci using hidden Markov models

^{1}School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, Queensland 4001, Australia

^{2}Modelling and Bioinformatics Department, Centre for Infections Health Protection Agency, 61 Colindale Avenue, London NW9 5EQ, UK

## Abstract

*Background*. Antibiotic-resistant nosocomial pathogens can arise in epidemic clusters or sporadically. Genotyping is commonly used to distinguish epidemic from sporadic vancomycin-resistant enterococci (VRE). We compare this to a statistical method to determine the transmission characteristics of VRE.

*Methods and findings*. A structured continuous-time hidden Markov model (HMM) was developed. The hidden states were the number of VRE-colonized patients (both detected and undetected). The input for this study was weekly point-prevalence data; 157 weeks of VRE prevalence. We estimated two parameters: one to quantify the cross-transmission of VRE and the other to quantify the level of VRE colonization from sporadic sources. We compared the results to those obtained by concomitant genotyping and phenotyping.

We estimated that 89% of transmissions were due to ward cross-transmission while 11% were sporadic. Genotyping found that 90% had identical glycopeptide resistance genes and 84% were identical or nearly identical on pulsed-field gel electrophoresis (PFGE).

There was some evidence, based on model selection criteria, that the cross-transmission parameter changed throughout the study period. The model that allowed for a change in transmission just prior to the outbreak and again at the peak of the outbreak was superior to other models. This model estimated that cross-transmission increased at week 120 and declined after week 135, coinciding with environmental decontamination.

*Significance*. We found that HMMs can be applied to serial prevalence data to estimate the characteristics of acquisition of nosocomial pathogens and distinguish between epidemic and sporadic acquisition. This model was able to estimate transmission parameters despite imperfect detection of the organism. The results of this model were validated against PFGE and glycopeptide resistance genotype data and produced very similar results. Additionally, HMMs can provide information about unobserved events such as undetected colonization.

**Keywords:**HMM, nosocomial pathogens, genotyping, statistical modelling, VRE

## 1. Introduction

There has been an alarming worldwide increase in the rate of infection from vancomycin-resistant enterococci (VRE) in the last 15 years (Murray 2006). Enterococci are part of the normal gastrointestinal flora and VRE colonization often is asymptomatic and undetected. However, in patients with compromised immune systems and breached integument, enterococci can become pathogenic, causing, for example, urinary tract infection, bacteraemia and endocarditis. Large teaching hospitals and intensive care units (ICUs) have the highest rate of infection with VRE (Weinstein 2005). Infection with enterococci harbouring a vancomycin resistance gene is associated with higher mortality (Lodise *et al*. 2002) and many strains of VRE are resistant to all known antibiotics.

Acquisitions of VRE colonization can be grouped broadly into those that come from cross-transmission within the ward which we call *transmitted*, and VRE that comes from other sources which we call *sporadic*. Ward transmission of multi-resistant organisms is believed to be predominantly from patient to patient via the transiently contaminated hands of health care workers (Boyce 2001). The sources of sporadic VRE include patients' gastrointestinal tract, prior colonization with VRE and transmission from outside the ward. The presence of VRE on admission is often initially not detected owing to infrequent swabbing, poor sensitivity of swabs or undetectable quantities of organism. VRE may exist in subdetectable numbers in human gut so that exposure of patients to antibiotics which facilitate VRE growth (Donskey *et al*. 2002) may lead to an apparently new case of VRE. VRE is also known to spread from other hospital wards via patient and staff movements (Trick *et al*. 1999).

To select the most appropriate infection control interventions, one needs to be able to estimate how much of the new acquisition is transmitted and how much is sporadic. Restricting antibiotic exposure is thought to control sporadic VRE, by reducing selection pressure in patients' endogenous flora, while hand hygiene, cohorting, patient isolation and limiting admission of colonized patients are thought to impact on transmitted VRE.

Outbreak investigation often involves time intensive methods to characterize the mode of VRE acquisition. Genotyping techniques such as pulsed-field gel electrophoresis (PFGE), distinguish clonal outbreaks, which are presumed to be due to transmitted VRE, from multiple new strain introductions, which are presumed to be due to sporadic VRE. There are occasions when this technique breaks down, when horizontal transfer of the resistance gene, vanA or vanB, can lead to several different genotypes being detected when in fact a single transposon is being transmitted (Suppola *et al*. 1999; Bradley *et al*. 2002; Weinstein 2005).

Attempts have been made to distinguish between these two processes of colonization based on statistical analysis of surveillance data. Pelupessy *et al*. (2002) used a Markov model, without hidden states, to estimate transmission parameters; finding estimates were similar to those using full event data and genotyping (PFGE). Cooper & Lipsitch (2004) used structured and unstructured hidden Markov models (HMMs) to describe infection incidence time-series data and to estimate transmission parameters. Collinearity between parameter estimates, failure of convergence and computational difficulties were identified as potential problems using HMMs for sparse data such as is typically found in time-series infection control data. Forrester & Pettitt (2005) compared background rates with cross-transmission rates of methicillin-resistance *Staphylococcus aureus*, finding background rates were larger than cross-transmission rates.

Estimating transmission coefficients using hospital infection control data has a number of challenges. There are unobserved processes occurring; the time of new acquisition of colonization is not observed. Additionally, when relying on routine swabs to determine the number of colonized patients, the sensitivity of swabs is less than 100%.

This study uses an epidemic model structure to characterize transmission of VRE during an outbreak at an 800 bed Australian teaching hospital. The current paper extends the work by Pelupessy *et al*. (2002) by estimating epidemiological parameters in the presence of suboptimal swab sensitivities. It also allows for the fact that new colonizations are not immediately detectable. We use an HMM structure to estimate transmission in the face of incomplete datasets and unobserved events. This framework distinguishes between rates of transmitted and sporadic VRE acquisitions. This study also considers that the transmission rates may change over time. Section 2.1 describes the data used to estimate VRE epidemic determinants. Section 2.4 describes the model of VRE transmission, while §2.5 describes the HMM and the methodology behind it. Section 3 gives the results of the parameter estimates, comparison of model estimates and genotyping data and model selection.

## 2. Methods

### 2.1 Description of outbreak and infection control interventions

VRE was first isolated at the Princess Alexandra Hospital, Brisbane, Australia in October 1996 and a VRE screening programme commenced in January 1997, the beginning of the data collection period for this study. Data used in this study are VRE colonization data from the ICU, renal and infectious diseases units. VRE colonized patients and were identified by clinical isolates, weekly routine screening and contact tracing swabs. Infection control interventions introduced from the start of the study period were restriction of vancomycin and third-generation cephalosporin use and isolation of colonized patients. From week 125 of this study, infection control teams were aware of an increased prevalence of VRE and further measures were taken. Dedicated equipment was used in patient rooms and patients were cohorted. VRE patients requiring haemodialysis used a dialysis facility within the infection control unit. Medical and nursing staff wore disposable aprons and latex gloves for patient contacts. An environmental audit was performed in August 1999, approximately week 135 of the study period and an aggressive cleaning programme was instituted (Bartley *et al*. 2001).

### 2.2 Serial surveillance data used for statistical analysis

Input data for the statistical model in this study were:

- Weekly prevalence data for VRE colonization.
- Mean length of stay of colonized patients: 15 days. This was calculated as the time from first identification of colonization to discharge.
- The total number of beds in the wards,
*N*=68.

The data were collected from 1 January 1997 to 31 December 1999. The weekly prevalence data are shown in figure 1.

### 2.3 Data used for cluster analysis

Microbiological and clinical data were collected, including admission dates and discharge dates of VRE colonized patients, as well as the date of first positive isolate. Additionally, we had information on the colonization status on admission of three of the patients transferred from other hospitals. Genotype data, both PFGE and glycopeptide resistance genotyping, were compared with the results of the statistical analysis as part of the study validation. Presumptive VRE colonies were identified using standard techniques. Speciation (distinguishing *Enterococcus faecium* and *Enterococcus faecalis*) was initially achieved by carbohydrate fermentation reactions of arabinose, mannose and raffinose then confirmed by a multiplex PCR assay based on specific detection of genes encoding d-alanine: d-alanine ligases (Bartley *et al*. 2001). VRE phenotype was identified based on vancomycin and teichoplanin mean inhibitory concentrations using the *E*-test method. This presumptively distinguishes vanA VRE, resistant to both vancomycin and teichoplanin, from vanB VRE, resistant to vancomycin but sensitive to teichoplanin. This phenotype result was confirmed by glycopeptide resistance genotyping, achieved through a modified multiplex PCR assay, described in detail by Bartley *et al*. (2001).

In the study on this outbreak by Bartley *et al*. (2001), isolates were also characterized using PFGE. Electrophoretic band patterns were analysed according to the criteria established by Tenover *et al*. (1995). Computer comparison using Gel Compar v. 4.1 (Applied Maths Kortrijk, Belgium) was based on the algorithm of the unweighted pair group method for arithmetic averages and using the Dice coefficient with 1.5% band tolerance (Bartley *et al*. 2001). This information was used to estimate the proportion of isolates that were from the same strain.

### 2.4 Model of transmission

We based our ward transmission model on the Susceptible-infected model with migration, described by Bailey (1975). Modified versions of this model have been used previously to analyse nosocomial transmission data (Pelupessy *et al*. 2002; Cooper & Lipsitch 2004; Forrester & Pettitt 2005).

A schematic of the model is shown in figure 2. The rate of cross-transmission of VRE colonization (per colonized per susceptible patient per day) is denoted by *β*. It is assumed that the ward is of fixed size, *N*, hence the number of uncolonized patients is *N*−*C*. Colonized patients are assumed to remain colonized for their entire hospital stay, therefore, transition from colonized to uncolonized occurs via discharge of a colonized patient and replacement with an uncolonized patient, which occurs at a rate *μ*. Duration of stay of colonized patients was available from the dataset. Acquisition of VRE, that is transmitted, is described by the mass-action term, *βC*(*N*−*C*). VRE acquisition, that is sporadic, can arise through ward admission of a colonized patient or any other process that is not related to the number of colonized patients, and occurs at a rate, *ν*(*N*−*C*). Each of the processes that lead to sporadic acquisition (for example, prior colonization or colonization from out-of-ward sources, endogenous gastrointestinal colonization) can reasonably be assumed to be independent of the number of colonized patients in the ward.

The probability of a change in the number of colonized patients, *C*, in a short time period, *h*, is given by

The number of colonized patients in the ward, *C*(*t*), forms a Markov process on state space 0,…,*N*, where *N* is the number of patients on the ward. Reflecting boundaries occur at states *i*=0 and *i*=*N*, provided *ν*>0, otherwise 0 is an absorbing state, and provided *μ*>0, otherwise *N* is an absorbing state.

### 2.5 Hidden Markov model

We aim to estimate parameters associated with sporadic colonization, *ν*, and the colonization caused by ward *transmission*, *β*, using the structured HMM illustrated in figure 3.

*C*represents the number of colonized patients in the ward (detected or undetected),

*Y*represents the number of patients detected at each time point. The horizontal arrows represent the transition from one state to the next,

**...**

Our HMM consists of: observations, ** Y**, the number of patients detected at each time point; underlying hidden states,

**, the number of colonized patients in the ward; a transition model linking each hidden state with its adjacent states, represented by horizontal lines in figure 3; an observation model linking the data with the hidden state, represented by the vertical lines in figure 3. There is one hidden state for each observation, denoted**

*C**C*

_{1},

*C*

_{2},…,

*C*

_{n}.

The full conditional probability of any node depends only on neighbouring nodes to which it is connected directly. The observation component of the HMM, denoted by ** Y**, consists of 157 data inputs of weekly VRE prevalence taken over 3 years and the vector of time points,

**=**

*t**t*

_{1},…,

*t*

_{n}, corresponding to each observation time. The vector

**consists of the**

*C**n*=157 hidden states. The transition probability matrix, giving the relationship between the hidden states, is described in appendix A. The observation model, giving the relationship between the observed and hidden states, is described in §2.6.

The parameters used in the model are given in table 1.

#### 2.5.1 Model assumptions

The model makes the following assumptions

- The ward is of fixed size,
*N*. - The model parameters are time invariant (this assumption is relaxed later in the study).
- Each colonized patient remains colonized for the remainder of their stay.
- Each observation of patients not known to be colonized is conditionally independent given the corresponding hidden state.
- The hidden states follow a first order time homogenous Markov process, that is$$\text{Pr}(C({t}_{k})|C({t}_{1}),\dots ,C({t}_{k-1}))=\mathrm{Pr}(C({t}_{k})|C({t}_{k-1}))=\mathrm{Pr}(C({t}_{k}-{t}_{k-1})|C(0)).$$
- Homogenous mixing of patients takes place.
- Uncolonized patients are identical with respect to susceptibility to colonization.
- Colonized patients are identical with respect to transmission of VRE.
- Time from colonization to discharge is exponentially distributed. Review of patient histories confirms that this is approximately the case.

These assumptions are discussed in §4.

### 2.6 Observation model

The probability of being known to be colonized (and therefore being included in the prevalence data) given that a patient is colonized, *d*, was unknown. Literature sources regarding the sensitivity of rectal swabs in detecting VRE were used to develop an expression for the uncertainty in this parameter. Estimates of the sensitivity of a rectal swab for VRE range from 0.58 (D'Agata *et al*. 2002) to 0.97 (Reisner *et al*. 2000) with values in between (Lemmen *et al*. 2001; Trick *et al*. 2004). We allowed for the uncertainty regarding the detection by assigning a uniform [0.58,0.97] prior distribution to *d*. The probability of detection at a given prevalence check, *d*, used in this study was patient related rather than simply swab related. If a patient was known from previous swabs to be colonized, the patient was automatically detected, thus *d* would be expected to be at least as high as the sensitivity of a single swab. The observation model assumed that each week's observed prevalence is independent of the previous week's observed prevalence, given the underlying true prevalence. This is an approximation as the true detection is the known colonized patients from the previous week and the new colonizations from the current week.

The probability relationship between the states and the data is described by the binomial distribution *Y*_{k}~Bin(*C*_{k}, *d*), where *Y*_{k} is the *k*th observed colonization prevalence and *C*_{k} is the actual number of colonized patients, the hidden state, at time *t*_{k}. This assumes that the probability, *d*, remains constant over the study period (for each iteration) and the probability of detection of each colonized patient is independent of the number of other colonized patients.

Alternative observation models with greater dispersion could have been used. For example, the Poisson or negative binomial distribution could have been chosen, had we been dealing with incidence rather than prevalence data. We chose the binomial distribution because it has a sound probabilistic basis (assuming fixed detection) and, unlike the Poisson, ensures that the hidden state (number colonized) is always larger than the observation (number detected), a necessary result when using prevalence data.

### 2.7 Bayesian framework

The parameters for transmitted VRE, *β* and sporadic VRE, *ν* were estimated using a Bayesian framework. Let *θ*_{p}={*β*,*ν*,*d*} be the vector of model parameters. Baum *et al*.'s (1970) recursion formula, summarized in appendix B, was used to determine the likelihood of the data, *l*(** Y**|

*θ*_{p}). Uniform

*U*[0,0.1] priors were assigned to

*β*and

*ν*, because little was known about these parameters other than that negative values or values higher than 0.1 were completely implausible. The posterior probability distributions

were estimated using a Monte-Carlo Markov chain algorithm, described in appendix C.

The Bayesian framework can provide estimates (and full posterior probability density) of any function of model parameters including functions which depend upon knowledge of hidden states. Let *θ*_{h} be the vector of *n* inferred hidden states *C*_{1},…,*C*_{n} and let ** θ**={

*θ*_{p},

*θ*_{h}}. The expected number of within-ward transmissions for the week, following week

*k*is

*βC*

_{k}(

*N*−

*C*

_{k}), while the total number of transmissions is

*βC*

_{k}(

*N*−

*C*

_{k})+

*ν*(

*N*−

*C*

_{k}). The expected proportion of VRE acquisitions due to ward transmission over the time of the study,

*f*(

**), is approximated by**

*θ*We evaluate the expectation, *E*[*f*(** θ**)|

**], by drawing samples**

*Y*

*θ*_{k},

*k*=1,…,

*m*from

*p*(

**|**

*θ***) and using the approximation of Gilks**

*Y**et al*. (1996, ch. 1)

The algorithm for this Monte-Carlo integration is given in appendix C.

### 2.8 Comparison of cluster analysis results using genotyping with statistical analysis

A genotyping study was performed on the VRE isolates by Bartley *et al*. (2001). Of the 49 isolates available for analysis, 44 were found to be *E. faecium* vanA using glycopeptide resistance genotyping. The estimated number of isolates having identical or closely related patterns on PFGE using the criteria of Tenover *et al*. (1995) was 41 of 49.

#### 2.8.1 Cluster analysis based on genotypic relatedness

We compared the proportion of ‘identical isolates’ (presumed to be part of a cluster) with the estimated proportion of transmitted VRE derived from the HMM and prevalence data. The posterior probability distribution of the proportion of VRE cases that are identical by genotype can readily be derived using a Bayesian framework and conjugate prior distribution (Gelman *et al*. 2004). Denote the parameter of interest, the proportion of VRE acquisitions that are identical, by *p*. Assume the form Beta(1,1) for the prior distribution for the proportion; this is the same as the uniform[0,1] prior. The probability of the data is given by the binomial Bin(*a*; (*a*+*b*), *p*), where *a* is the number of identical isolates and *b* is the number of non-identical isolates, as detected by the laboratory methods. The posterior probability density of *p* is Beta(1+*a*, 1+*b*).

## 3. Results

### 3.1 Transmission parameter estimation

The estimated value for the transmission coefficient, *β* was 10×10^{−4} (CI_{95} 7.9×10^{−4}, 13×10^{−4}) and the sporadic acquisition rate *ν* was 2.0×10^{−4} (CI_{95} 0.85×10^{−4}, 3.8×10^{−4}). The coefficient of correlation between *β* and *ν* was estimated to be −0.24. These results were obtained using a Markov chain Monte-Carlo algorithm with a burn-in period of 50000 as described in appendix C.

The basic reproduction ratio, *R*_{0}, is ‘the average number of persons directly infected by an infectious case during its entire infectious period, after entering a totally susceptible population’ (Giesecke 1994). In this model it can be shown to be *R*_{0}=*βN*/*μ*. This formula for *R*_{0} is an approximation as there is a finite population in this setting. The basic reproduction ratio is estimated to be 1.07 (CI_{95} 0.78–1.34).

The mean value for the estimated detection rate, *d*, was 0.75 with a 95% credible interval of 0.59–0.93.

### 3.2 Comparison of statistical model and genotyping data

The proportion of VRE acquisitions due to transmission, was estimated to be 89% (CI_{95}=78–95%), using Bayesian inference applied to the HMM structure. This compares with 84% (41/49) of isolates observed to be identical or nearly identical using PFGE genotyping and 90% (44/49) using glycopeptide resistance genotyping. The posterior distribution of the estimated proportion of colonizations due to ward transmission compared with those found to be identical by glycopeptide resistance genotype and PFGE methods are displayed in figure 4.

### 3.3 Sensitivity analysis

The length of stay following colonization could be greater than the estimated 15 days because acquisition could have preceded detection. Conversely, the length of stay could have been less than 15 days because undetected colonized patients are likely to have shorter stays than detected colonized patients. We therefore performed a sensitivity analysis on the discharge rate parameter, *μ*. We took upper and lower values for *μ* which we believed at the extreme ends of plausibility. We then repeated the estimation of the proportion of VRE acquisitions due to within-ward transmission. Results are given in table 2.

Table 2 shows there is a small change in the estimate for large changes in the discharge parameter, *μ*.

### 3.4 Model selection

The values of the deviance information criterion (DIC) were used to assess the optimum model to fit the data (Gelman *et al*. 2004). Results are given in table 3.

*P*

_{d}: effective number of parameters.

Several models were explored. Setting either *β* or *ν* to zero led to much higher values for the DIC, giving substantial statistical support to a mixed model, in which VRE colonization arose both from cross-transmission in the ward and sporadically. The model in which *β* changed after week 120 was a superior fit to the model with time-invariant parameters. Allowing for a further change in *β* after week 135 provided the best fit of those models investigated. The effective number of parameters in a latent variable model depends on the collinearity of the parameters and the influence of the latent variables.

## 4. Discussion

The aim of this study was to characterize transmission of VRE using statistical methods and simple serial surveillance data. We included a term for sporadic colonization because we believe that new acquisitions of VRE could occur through means other than within-ward patient to patient cross-transmission. Sources of sporadic colonization have been labelled in the past as endogenous, spontaneous (Pelupessy *et al*. 2002) or background (Forrester & Pettitt 2005). Our statistical methods were designed to quantify the rates of sporadic and cross-transmitted VRE. Previous attempts have encountered difficulties especially with identifiability of variables (Cooper & Lipsitch 2004).

Full patient histories, PFGE and glycopeptide resistance genotype data were used for validation but were not included in the statistical analysis in this study. Estimates of the proportion of VRE resulting from cross-transmission based on statistical methods (HMMs) in this study were very similar to those based on vancomycin resistance genotype data.

The proportion of clustered isolates based on PFGE analysis was lower than both the vancomycin resistance genotype data and the statistical analysis. This could be due to horizontal transfer of resistance gene to new strains of enterococci, which has been reported previously (Suppola *et al*. 1999; Weinstein 2005). If horizontal transfer of resistance genes occurs during an outbreak, cross-transmitted strains have identical glycopeptide resistance genotypes but different PFGE patterns, hence PFGE underestimates clustering.

Using a structured HMM, one can estimate the hidden states behind the data, the number of patients colonized on the ward (both detected and undetected). We estimated the basic reproduction ratio to be close to unity, the threshold value that could lead to endemic VRE. We were able to make estimates of transmission in the face of imperfect datasets in which transmission times and patient histories were unknown and swab sensitivity was considerably less than 100%. This approach is similar to that of Cooper & Lipsitch (2004), who observed monthly infection *incidence* and assumed a Poisson relationship with the number colonized, the hidden state. The current study avoids the ambiguity of the relationship between the observations and the hidden state using *prevalence* (observed number detected at time points) which relates directly to the hidden state, the number colonized, through a binomial relationship.

For simplicity, this study assumed homogenous mixing of staff and patients. Future studies could extend this model to include ward coupling, however, dividing the data to incorporate ward structure would lead to reduced precision in parameter estimates and increased model complexity. We incorporated this uncertainty into our parameter estimates and model conclusions were robust to changes in its value.

The time to discharge was estimated by taking the mean time from first identification of colonized patients to discharge (15 days). The discharge rate was taken as the reciprocal of the mean time to colonization. This assumes that the time to discharge was exponentially distributed which was indeed approximately the case for those known to be colonized. The true time to discharge of colonized patients could have been longer than estimated in this study if patients were colonized for significant time-intervals prior to detection or they could have been shorter if a substantial number of the undetected colonized patients had shorter durations of stay. We performed a sensitivity analysis on the discharge rate parameter, *μ*, and found large changes in *μ* (±33%) resulted in small changes in the estimates of proportion of patients colonized within the ward and is therefore unlikely to have influenced the conclusions of this study.

The model presented in this study postulated that VRE acquisition arose from both cross-transmission and sporadic sources. Model comparison techniques found this model to be a far superior fit to the data compared with models which relied on either cross-transmission or sporadic sources of VRE acquisition alone, strongly supporting that both modes of acquisition were taking place.

We investigated changes in transmission over time using a structured epidemic model. Model comparison showed that there was evidence supporting the conclusion that there was an increase in cross-transmission just prior to the outbreak. There was limited evidence that the cross-transmission rate reduced after the epidemic peak at week 135, coinciding with the environmental cleaning intervention. Future studies using larger surveillance datasets could extend the methodology presented to consider more models in which parameters are time-dependent. One approach to this would be to use the reversible jump Monte-Carlo Markov chain method (Green 1995) or the birth–death Markov process model (Stephens 2000).

Inaccuracies in PFGE cluster analysis can arise from the horizontal transfer of resistance genes. Glycopeptide resistance genotype analyses are not subject to inaccuracies due to gene transfer but cannot distinguish different strains that might all be of the same resistance genotype. Statistical methods are not subject to these problems and have the additional advantage that they are not resource intensive. It is interesting to speculate whether they also have the potential to be used in real time, within a control-chart outbreak alert system.

The model presented here can be used to model the transmission of other bacterial pathogens in small scale settings of healthcare institutions, such as methicillin-resistant *Staphylococcus aureus*, extended spectrum beta-lactamase producing and other multi-resistant Gram-negative pathogens.

## Acknowledgments

This work was partially supported by a grant under the Australian Research Council Linkage Scheme (LP0347112) and NHMRC scholarship number 290541. The authors would like to thank Dr Mike Whitby for providing data and Dr Paul Bartley for helpful comments. The authors would like to thank the anonymous reviewers for their constructive comments.

## Appendix A. Constructing a transition probability matrix

Following the theory of Cox & Miller (1965), we developed a transition probability matrix, ${\mathit{\Gamma}}_{({t}_{k}-{t}_{k-1})}$. The *ij*th element of ${\mathit{\Gamma}}_{({t}_{k}-{t}_{k-1})}$ gives the probability of having *j* colonized patients on the ward at time *t*_{k}, given that there were *i* colonized patients on the ward at time *t*_{k−1}.

To construct the transition probability matrix for an arbitrary time-interval, first we developed a discrete time transition probability matrix, ** A**, for a small time-interval,

*h*. Let

**be the matrix in which the**

*A**ij*th element is given by Pr(

*C*(

*t*+

*h*)=

*j*|

*C*(

*t*)=

*i*). A is given using the system of equation (2.1). Here,

*i*and

*j*are the number of patients colonized in the ward and can take on values 0,…,

*N*.

Let ** p**(

*t*) be the (

*N*+1) vector of probabilities of the number colonized at time

*t*. The generator matrix,

**is a square, (**

*G**N*+1)×(

*N*+1), matrix that has the property that

The *ij*th element of the generator matrix, ** G**, is the instantaneous rate of change of probability of being in state

*j*, given a beginning in state

*i*. Then

**is given by**

*G*where ** I** is the identity matrix.

Following from expression (A2), we have

in general. Specifically, after a time-interval *t*_{k}–*t*_{k−1}, the probability of being in state *j* having begun in state *i* is the *ij*th element of the transition probability matrix, given by

Cox & Miller (1965, ch. 4.5) and MacDonald & Zucchini (1997) give an expanded explanation. The matrix exponential ${\text{e}}^{({t}_{k}-{t}_{k-1})\mathit{G}}$ was calculated using the Matlab ‘expm’ function.

## Appendix B. Likelihood computation

The probability of the full dataset and a particular sequence of hidden states, *C*_{1}, *C*_{2},…,*C*_{n} is given by

with ${\mathit{\Gamma}}_{{C}_{k-1}{C}_{k}}$ as defined in appendix A.

The likelihood calculation of this single permutation of hidden states requires 2*n* computations even after the matrix exponential has been evaluated. The full likelihood of the data over all the states is

which requires 2*n*(*N*+1)^{n} computations for one likelihood evaluation (Le Strat & Carrat 1999). This intractable calculation (with *n*=157 and *N*=68) can be simplified using Baum's recursion technique (Baum *et al*. 1970) as shown below.

The forward recursion involves simplifying the likelihood computations by considering a partial observation sequence and a single state sequence. Let *ϕ*_{k}(*i*) be the probability of the partial observation sequence (*Y*_{1}, *Y*_{2},…,*Y*_{k}) produced by all possible state sequences that end in state *i*. The probability is given by

Let ** δ** be the (size

*N*+1) vector of probabilities of the first state, (

*δ*_{i}=Pr(

*C*

_{1}=

*i*)). In the forward recursion method of likelihood computation, the value of

**needs to be determined in the absence of data. The stationary distribution of the transition matrix can be used for this (MacDonald & Zucchini 1997). The probability of the first state and first observation,**

*δ**Y*

_{1}, is given by

The forward recursion formula is then applied. We multiply every state probability, *ϕ*_{k−1}(*i*), by the transition probability *Γ*_{ij} and by the probability of the *k*th data point, given the hidden state *j*. This results in a vector of probabilities which is then summed to determine *ϕ*_{k}(*j*). Thus, the probability of subsequent states is given by

At each step in the forward recursion, the procedure can be terminated and the probability of the partial observation sequence is determined by

The likelihood of the data can then be determined by

See Petrushin (2000) for a detailed discussion of the forward and backward recursion formulae.

## Appendix C. Monte-Carlo Markov chain algorithm

The algorithm for this Monte-Carlo integration used to estimate the proportion of VRE acquisitions due to ward cross-transmission, *f*(** θ**), is given below.

The MCMC algorithm has the following steps:

- Assume the prior probability for
*β*and*ν*, to be (*U*[0,0.1]). These priors were used as little prior information was known except that negative values and values greater than 0.1 are completely implausible. - Initialize
*β*,*ν*and*d*. Different initial values were chosen from (*β*=10^{−5}to*β*=10^{−2}, from*ν*=10^{−5}to*ν*=10^{−2}) and from*d*=0.58 to*d*=0.97. - Assign the prior probability of the hidden states. A discrete uniform distribution on (0,…,
*N*) was used. - Initialize each hidden state using its corresponding observation and the (binomial) observation model
*Y*_{k}~Bin(*C*_{k},*d*). - Determine the probability of the data and sequence of hidden states using equation (B1).
- Propose a new
*β*′ using a simple random walk, the step size ~*N*(0,10^{−4}). - Accept
*β*′ using a Metropolis–Hastings step with the acceptance probabilitywhere$$a=\text{min}\left\{1,\frac{\pi ({\beta}^{\prime})\text{Pr}(\mathit{Y},\mathit{C}|{\beta}^{\prime})q({\beta}^{\prime}\to \beta )}{\pi (\beta )\text{Pr}(\mathit{Y},\mathit{C}|\beta )q(\beta \to {\beta}^{\prime})}\right\},$$(C1)*q*(*β*→*β*′) is the proposal probability for*β*′ from*β*which is the normal density for*β*′ with mean*β*and variance 10^{−4}. - Repeat for
*ν*′ and*d*′. - Update each hidden state using a Gibbs update, drawing from the distributions given by the conditional probability of the states, determined by neighbouring states and observations, as described below.
- Determine
*f*() for the particular sequence of hidden states and parameters*θ**β*and*ν*using expression (A3). - Iterate by returning to step (iv).
- Burn-in using 50000 iterations. Use the following 50000 updates to estimate the posterior probability distribution (using the ergodic average) of the hidden states (
*C*_{1},…,*C*_{n}) and*f*().*θ* - Repeat steps 2–12 to construct 10 such Markov chains each with different initial values. Convergence tests showed that 50000 updates were sufficient to get precise estimates of the parameters ($\stackrel{\u02c6}{R}=1.02$ for estimates of logit(proportion)) (Gelman
*et al*. 2004, ch. 11.6). - Use 10×50000 updates to determine the posterior probability densities of the model parameters.

The Gibbs update involves determining the conditional probability of the hidden states (given everything else). The assumption that the hidden states are a first order Markov process means that the conditional probability of the hidden states is based only on neighbouring states and the corresponding datum. The full conditional probability of the hidden state, *C*_{n}(*k*=2,…,*n*−1), is given by

where *C*_{\k} is the set of all states other than *C*_{k}; and *i* is the proposed value of the *k*th hidden state; and *h* and *j* are the current values of the hidden states *k*−1 and *k*+1, respectively.

The first and last states depend only on a single neighbour and the data associated with that state. That is

and

The conditional probability of the states can be determined and this becomes the sampling distribution for the hidden state. Each of the *n* states can be updated in a forward, backward or random manner. To estimate values of *ν* and *β*, we do not need to infer hidden states. The simplified MCMC algorithm has the following steps:

- Assign the prior probability for
*β*and*ν*using (*U*[0,0.1]). - Initialize
*β*,*ν*and*d*. - Determine the likelihood of the data using Baum's recursion formula.
- Propose a new
*β*′ using a simple random walk, the step size ~*N*(0,0.0001). - Accept
*β*′ using a Metropolis–Hastings step with the acceptance probability

- Repeat for
*ν*and*d*. - Iterate as above.

## References

- Bailey N. Charles Griffin; London, UK: 1975. The biomathematics of malaria.
- Bartley P.B, Schooneveldt J.M, Looke D.F, Morton A, Johnson D.W, Nimmo G.R. The relationship of a clonal outbreak of
*Enterococcus faecium*VanA to methicillin-resistant*Staphylococcus aureus*incidence in an Australian hospital. J. Hosp. Infect. 2001;48:43–54. doi:10.1053/jhin.2000.0915 [PubMed] - Baum L, Petrie T, Soules G, Weiss N. A maximisation technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 1970;41:164–171.
- Boyce J.M. MRSA patients: proven methods to treat colonization and infection. J. Hosp. Infect. 2001;48(Suppl. A):S9–S14. doi:10.1016/S0195-6701(01)90005-2 [PubMed]
- Bradley S.J, Kaufmann M.E, Happy C, Ghori S, Wilson A.L, Scott G.M. The epidemiology of glycopeptide-resistant enterococci on a haematology unit: analysis by pulsed-field gel electrophoresis. Epidemiol. Infect. 2002;129:57–64. doi:10.1017/S0950268802007033 [PMC free article] [PubMed]
- Cooper B, Lipsitch M. The analysis of hospital infection data using Hidden Markov models. Biostatistics. 2004;5:223–237. doi:10.1093/biostatistics/5.2.223 [PubMed]
- Cox D.R, Miller H.D. Methuen; London, UK: 1965. The theory of stochastic processes.
- D'Agata E.M, Gautam S, Green W.K, Tang Y.W. High rate of false-negative results of the rectal swab culture method in detection of gastrointestinal colonization with vancomycin-resistant enterococci. Clin. Infect. Dis. 2002;34:167–172. doi:10.1086/338234 [PubMed]
- Donskey C.J, Hoyen C.K, Das S.M, Helfand M.S, Hecker M.T. Recurrence of vancomycin-resistant enterococcus stool colonization during antibiotic therapy. Infect. Control Hosp. Epidemiol. 2002;23:436–440. doi:10.1086/502081 [PubMed]
- Forrester M, Pettitt A.N. Use of stochastic epidemic modeling to quantify transmission rates of colonization with methicillin-resistant
*Staphylococcus aureus*in an intensive care unit. Infect. Control Hosp. Epidemiol. 2005;26:598–606. doi:10.1086/502588 [PubMed] - Gelman A, Carlin J, Stern H, Rubin D.B. Bayesian data analysis. 2nd edn. Chapman and Hall/CRC; Boca Raton, FL: 2004. p. Fla.
- Giesecke J. Edward Arnold; London, UK: 1994. Modern infectious disease epidemiology.
- Gilks W, Richardson S, Spiegelhalter D. Chapman and Hall; London, UK: 1996. Markov Chain Monte Carlo in practice.
- Green P. Reversible jump Markov Chain Monte Carlo computation and Bayesian Model determination. Biometrika. 1995;82:711–732. doi:10.1093/biomet/82.4.711
- Le Strat Y, Carrat F. Monitoring epidemiologic surveillance data using hidden Markov models. Stat. Med. 1999;18:3463–3478. doi:10.1002/(SICI)1097-0258(19991230)18:24<3463::AID-SIM409>3.0.CO;2-I [PubMed]
- Lemmen S.W, Hafner H, Zolldann D, Amedick G, Lutticken R. Comparison of two sampling methods for the detection of Gram-positive and Gram-negative bacteria in the environment: moistened swabs versus rodac plates. Int. J. Hyg. Environ. Health. 2001;203:245–248. doi:10.1078/S1438-4639(04)70035-8 [PubMed]
- Lodise T.P, McKinnon P.S, Tam V.H, Rybak M.J. Clinical outcomes for patients with bacteremia caused by vancomycin-resistant enterococcus in a level 1 trauma center. Clin. Infect. Dis. 2002;34:922–929. doi:10.1086/339211 [PubMed]
- MacDonald I, Zucchini W. Chapman and Hall; London, UK: 1997. Hidden Markov models for discrete valued time series.
- Murray B. Overview of enterococci. In: Rose B.D, editor. UpToDate. UpToDate; Massachusetts, MA: 2006. (CD-rom.)
- Pelupessy I, Bonten M.J, Diekmann O. How to assess the relative importance of different colonization routes of pathogens within hospital settings. Proc. Natl Acad. Sci. USA. 2002;99:5601–5605. doi:10.1073/pnas.082412899 [PMC free article] [PubMed]
- Petrushin, V. 2000 Hidden Markov models: fundamentals and applications. Part2 Discrete and continuous hidden Markov models. In
*Online Symp. for Electronics Engineers*http://www.techonline.com/osee/ - Reisner B.S, Shaw S, Huber M.E, Woodmansee C.E, Costa S, Falk P.S, Mayhall C.G. Comparison of three methods to recover vancomycin-resistant enterococci (VRE) from perianal and environmental samples collected during a hospital outbreak of VRE. Infect. Control Hosp. Epidemiol. 2000;21:775–779. doi:10.1086/501734 [PubMed]
- Stephens M. Bayesian analysis of mixture models with an unknown number of components-an alternative to reversible jump methods. Ann. Stat. 2000;28:40–74. doi:10.1214/aos/1016120364
- Suppola J.P, Kolho E, Salmenlinna S, Tarkka E, Vuopio-Varkila J, Vaara M. vanA and vanB incorporate into an endemic ampicillin-resistant vancomycin-sensitive
*Enterococcus faecium*strain: effect on interpretation of clonality. J. Clin. Microbiol. 1999;37:3934–3939. [PMC free article] [PubMed] - Tenover F, Arbeit R, Goering R, Mickelsen P, Murray B, Persing D, Swaminathan B. Interpreting chromosomal DNA restriction patterns produced by pulsed field gel electrophoresis: criteria for bacterial strain typing. J. Clin. Microbiol. 1995;33:2233–2239. [PMC free article] [PubMed]
- Trick W.E, Kuehnert M.J, Quirk S.B, Arduino M.J, Aguero S.M, Carson L.A, Hill B.C, Banerjee S.N, Jarvis W.R. Regional dissemination of vancomycin-resistant enterococci resulting from interfacility transfer of colonized patients. J. Infect. Dis. 1999;180:391–396. doi:10.1086/314898 [PubMed]
- Trick W.E, Paule S.M, Cunningham S, Cordell R.L, Lankford M, Stosor V, Solomon S.L, Peterson L.R. Detection of vancomycin-resistant enterococci before and after antimicrobial therapy: use of conventional culture and polymerase chain reaction. Clin. Infect. Dis. 2004;38:780–786. doi:10.1086/381552 [PubMed]
- Weinstein J. Hospital-acquired (nosocomial) infections with vancomycin-resistant enterococci. In: Rose B.D, editor. UpToDate. UpToDate; Massachusetts, MA: 2005. p. 2006. (CD-rom.)

**The Royal Society**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (186K) |
- Citation

- 2004 Lowbury Lecture: the Western Australian experience with vancomycin-resistant enterococci - from disaster to ongoing control.[J Hosp Infect. 2006]
*Pearman JW.**J Hosp Infect. 2006 May; 63(1):14-26. Epub 2006 Mar 24.* - A hospital epidemic of vancomycin-resistant Enterococcus: risk factors and control.[Infect Control Hosp Epidemiol. 2001]
*Byers KE, Anglim AM, Anneski CJ, Germanson TP, Gold HS, Durbin LJ, Simonton BM, Farr BM.**Infect Control Hosp Epidemiol. 2001 Mar; 22(3):140-7.* - Prevalence and determinants of fecal colonization with vancomycin-resistant Enterococcus in hospitalized patients in The Netherlands.[Infect Control Hosp Epidemiol. 2000]
*van den Braak N, Ott A, van Belkum A, Kluytmans JA, Koeleman JG, Spanjaard L, Voss A, Weersink AJ, Vandenbroucke-Grauls CM, Buiting AG, et al.**Infect Control Hosp Epidemiol. 2000 Aug; 21(8):520-4.* - Vancomycin-resistant enterococci: 15 years and counting.[J Hosp Infect. 2003]
*Chavers LS, Moser SA, Benjamin WH, Banks SE, Steinhauer JR, Smith AM, Johnson CN, Funkhouser E, Chavers LP, Stamm AM, et al.**J Hosp Infect. 2003 Mar; 53(3):159-71.* - Insights into the epidemiology and control of infection with vancomycin-resistant enterococci.[Clin Infect Dis. 2000]
*Hayden MK.**Clin Infect Dis. 2000 Oct; 31(4):1058-65. Epub 2000 Oct 25.*

- Nosocomial Transmission of C. difficile in English Hospitals from Patients with Symptomatic Infection[PLoS ONE. ]
*van Kleef E, Gasparrini A, Guy R, Cookson B, Hope R, Jit M, Robotham JV, Deeny SR, Edmunds WJ.**PLoS ONE. 9(6)e99860* - A Modeling Framework for the Evolution and Spread of Antibiotic Resistance: Literature Review and Model Categorization[American Journal of Epidemiology. 2013]
*Spicknall IH, Foxman B, Marrs CF, Eisenberg JN.**American Journal of Epidemiology. 2013 Aug 15; 178(4)508-520* - Modelling the transmission of healthcare associated infections: a systematic review[BMC Infectious Diseases. ]
*van Kleef E, Robotham JV, Jit M, Deeny SR, Edmunds WJ.**BMC Infectious Diseases. 13294* - Limits of patient isolation measures to control extended-spectrum beta-lactamase–producing Enterobacteriaceae: model-based analysis of clinical data in a pediatric ward[BMC Infectious Diseases. ]
*Domenech de Cellès M, Zahar JR, Abadie V, Guillemot D.**BMC Infectious Diseases. 13187* - Empirical Mode Decomposition and k-Nearest Embedding Vectors for Timely Analyses of Antibiotic Resistance Trends[PLoS ONE. ]
*Teodoro D, Lovis C.**PLoS ONE. 8(4)e61180*

- MedGenMedGenRelated information in MedGen
- PubMedPubMedPubMed citations for these articles
- TaxonomyTaxonomyRelated taxonomy entry
- Taxonomy TreeTaxonomy Tree

- Characterizing an outbreak of vancomycin-resistant enterococci using hidden Mark...Characterizing an outbreak of vancomycin-resistant enterococci using hidden Markov modelsJournal of the Royal Society Interface. Aug 22, 2007; 4(15)745

Your browsing activity is empty.

Activity recording is turned off.

See more...