- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- NIHPA Author Manuscripts
- PMC2365921

# Generation interval contraction and epidemic data analysis

^{1}Department of Epidemiology, Harvard School of Public Health, 677 Huntington Ave., Boston, Massachusetts, USA

^{2}Department of Biostatistics, Harvard School of Public Health, 677 Huntington Ave., Boston, Massachusetts, USA

^{3}Department of Immunology and Infectious Disease, Harvard School of Public Health 677 Huntington Ave., Boston, Massachusetts, USA

## Abstract

The *generation interval* is the time between the infection time of an infected person and the infection time of his or her infector. Probability density functions for generation intervals have been an important input for epidemic models and epidemic data analysis. In this paper, we specify a general stochastic SIR epidemic model and prove that the mean generation interval decreases when susceptible persons are at risk of infectious contact from multiple sources. The intuition behind this is that when a susceptible person has multiple potential infectors, there is a “race” to infect him or her in which only the first infectious contact leads to infection. In an epidemic, the mean generation interval contracts as the prevalence of infection increases. We call this *global competition* among potential infectors. When there is rapid transmission within clusters of contacts, generation interval contraction can be caused by a high local prevalence of infection even when the global prevalence is low. We call this *local competition* among potential infectors. Using simulations, we illustrate both types of competition. Finally, we show that hazards of infectious contact can be used instead of generation intervals to estimate the time course of the effective reproductive number in an epidemic. This approach leads naturally to partial likelihoods for epidemic data that are very similar to those that arise in survival analysis, opening a promising avenue of methodological research in infectious disease epidemiology.

## 1 Introduction

In infectious disease epidemiology, the *serial interval* is the difference between the symptom onset time of an infected person and the symptom onset time of his or her infector [1]. This is sometimes called the “generation interval.” However, we find it more useful to adopt the terminology of Svensson [2] and define the *generation interval* as the difference between the infection time of an infected person and the infection time of his or her infector. By these definitions, the serial interval is observable while the generation interval usually is not. We define *infectious contact* from *i* to *j* to be a contact that is sufficient to infect *j* if *i* is infectious and *j* is susceptible, and we define a *potential infector* of person *i* to be an infectious person who has positive probability of making infectious contact with *i*. Finally, we use the term *hazard* rather than *force of infection* to highlight the similarities between epidemic data analysis and survival analysis.

The generation interval has been an important input for epidemic models used to investigate the transmission and control of SARS [3, 4] and pandemic influenza [5,6]. More recently, generation interval distributions have been used to calculate the incubation period distribution of SARS [7] and to estimate *R*_{0} from the exponential growth rate at the beginning of an epidemic [8]. It is generally assumed that the generation interval distribution is characteristic of an infectious disease. In this paper, we show that this is not true. Instead, the expected generation interval decreases as the number of potential infectors of susceptibles increases. During an epidemic, generation intervals tend to contract as the prevalence of infection increases. This effect was described by Svensson [2] for an SIR model with homogeneous mixing. In this paper, we extend this result to all time-homogeneous stochastic SIR models.

A simple thought experiment illustrates the intuition behind our main result. Imagine a susceptible person *j* in a room. Place *m* other persons in the room and infect them all at time *t* = 0. For simplicity, assume that infectious contact from *i* to *j* occurs with probability one, *i* = 1, ..., *m*. Let *t _{ij}* be a continuous nonnegative random variable denoting the first time at which

*i*makes infectious contact with

*j*. Person

*j*is infected at time

*t*= min(

_{j}*t*

_{1j}, ...,

*t*). Since all infectious persons were infected at time zero,

_{mj}*t*is the generation interval. If we repeat the experiment with larger and larger

_{j}*m*, the expected value of min(

*t*

_{1j}, ...,

*t*) will decrease.

_{mj}When a susceptible person is at risk of infectious contact from multiple sources, there is a “race” to infect him or her in which only the first infectious contact leads to infection. Generation interval contraction is an example of a well-known phenomenon in epidemiology: The expected time to an outcome, given that the outcome occurs, decreases in the presence of competing risks. In our thought experiment, the outcome is the infection of *j* by a given *i* and the competing risks are infectious contacts from all sources other than *i*.

Adapting our thought experiment slightly, we see that the contraction of the generation interval is a consequence of the fact that the hazard of infection for *j* increases as the number of potential infectors increases. Let λ(*t*) be the hazard of infectious contact from any potential infector to *j* at time *t* and let *E*[*t _{j}*|

*m*] be the expected infection time of

*j*given

*m*potential infectors. Then

so the expected generation interval decreases as the number of potential infectors increases. A hazard of infection that increases with the number of potential infectors is a defining feature of most epidemic models, so generation interval contraction is a very general phenomenon. We note that a very similar phenomenon occurs in endemic diseases, where increased force of infection results in a decreased average age at first infection [9].

The rest of the paper is organized as follows: In Section 2, we describe a general stochastic SIR epidemic model. In Section 3, we use this model to show that the mean generation interval decreases as the number of potential infectors increases. As a corollary, we find that the mean serial interval also decreases. In Section 4, we consider the role of the population contact structure in generation interval contraction and illustrate the effects of global and local competition among potential infectors with simulations. In Section 5, we argue that hazards of infectious contact should be used instead of generation or serial interval distributions in the analysis of epidemic data. Section 6 summarizes our main results and conclusions.

## 2 General stochastic SIR model

We start with a very general stochastic “Susceptible-Infectious-Removed” (SIR) epidemic model. This model includes fully-mixed and network-based models as special cases, and it has been used previously to define a mapping from the final outcomes of stochastic SIR models to the components of semi-directed random networks [10, 11].

Each person *i* is infected at his or her *infection time t _{i}*, with

*t*= ∞ if

_{i}*i*is never infected. Person

*i*recovers from infectiousness or dies at time

*t*+

_{i}*r*, where the

_{i}*recovery period r*is a positive random variable with the cumulative distribution function (cdf)

_{i}*F*(

_{i}*r*). The recovery period

*r*may be the sum of a

_{i}*latent period*, during which

*i*is infected but not infectious, and an

*infectious period*, during which

*i*can transmit infection. We assume that all infected persons have a finite recovery period. If person

*i*is never infected, let

*r*= ∞. Let Sus(

_{i}*t*) = {

*i*:

*t*>

_{i}*t*} be the set of susceptibles at time

*t*.

When person *i* is infected, he or she makes infectious contact with person *j* after an *infectious contact interval τ _{ij}*. Each

*τ*is a positive random variable with cdf

_{ij}*F*(

_{ij}*τ*|

*r*) and survival function

_{i}*S*(

_{ij}*τ*|

*r*) = 1 -

_{i}*F*(

_{ij}*τ*|

*r*). Let

_{i}*τ*= ∞ if person

_{ij}*i*never makes infectious contact with person

*j*, so the infectious contact interval distribution may have probability mass at ∞. Define

which is the conditional probability that *i* never makes infectious contact with *j* given *r _{i}*. Since a person cannot transmit disease before being infected or after recovering from infectiousness,

*S*(

_{ij}*τ*|

*r*) = 1 for all

_{i}*τ*≤ 0 and

*S*(

_{ij}*τ*|

*r*) =

_{i}*S*(∞|

_{ij}*r*) for all

_{i}*τ*≥

*r*. Since a person cannot infect himself (or herself),

_{i}*τ*= ∞ with probability one and

_{ii}*S*(

_{ii}*τ*|

*r*) = 1 for all

_{i}*τ*.

The *infectious contact time t _{ij}* =

*t*+

_{i}*τ*is the time at which person

_{ij}*i*makes infectious contact with person

*j*. If person

*j*is susceptible at time

*t*, then

_{ij}*i*infects

*j*and

*t*=

_{j}*t*. If

_{ij}*t*< ∞, then

_{ij}*t*≤

_{j}*t*because person

_{ij}*j*avoids infection at time

*t*only if he or she has already been infected. If person

_{ij}*i*never makes infectious contact with person

*j*, then

*t*= ∞ because

_{ij}*τ*= ∞. Figure 1 shows a schematic diagram of the relationships among

_{ij}*r*, and

_{i}, τ_{ij}*t*.

_{ij}*ij*. Recall that

*t*≤

_{j}*t*. As discussed in Section 3.2, person

_{ij}*i*develops symptoms at time ${t}_{i}^{\mathrm{sym}}={t}_{i}+{q}_{i}$, where

*q*is the incubation period.

_{i}The *importation time t*_{0i} of person *i* is the earliest time at which he or she receives infectious contact from outside the population. The importation time vector **t**_{0} = (*t*_{01}, ..., *t*_{0n}).

We assume that each infected person has a unique infector. Following [4], we let *v _{i}* represent the index of the person who infected person

*i*, with

*v*= 0 for imported infections and

_{i}*v*= ∞ if

_{i}*i*is never infected. If tied infectious contact times have nonzero probability, then

*v*can be chosen from all

_{i}*j*such that

*t*=

_{ji}*t*< ∞.

_{i}### 2.1 Epidemics

Let *t*_{(1)} ≤ *t*_{(2)} ≤ ... ≤ *t*_{(m)} be the order statistics of all *t*_{1}, ..., *t _{n}* less than infinity, and let (

*k*) be the index of the

*k*

^{th}person infected. Before the epidemic begins, an importation time vector

**t**

_{0}is chosen. The epidemic begins at time

*t*

_{(1)}= min

_{i}(

*t*

_{0i}). Person (1) is assigned a recovery time

*r*

_{(1)}. Every person

*j*Sus(

*t*

_{(1)}) is assigned an infectious contact time

*t*

_{(1)j}=

*t*

_{(1)}+

*τ*

_{(1)j}. The second infection occurs at

*t*

_{(2)}= min

_{jSus(t(1))}min(

*t*

_{0j},

*t*

_{(1)j}), which is the first infectious contact time after

*t*

_{(1)}. Person (2) is assigned a infectious period

*r*

_{(2)}. After

*k*infections, the next infection occurs at

*t*

_{(k+1)}= min

_{jSus(t(k)}) min(

*t*

_{0j},

*t*

_{(1)j}, ...,

*t*

_{(k)j}). The epidemic stops after

*m*infections if and only if

*t*

_{(m+1)}= ∞.

## 3 Generation interval contraction

In this section, we show that the mean infectious contact interval *τ _{ij}* given that

*i*infects

*j*is shorter than the mean infectious contact interval given that

*i*makes infectious contact with

*j*. In the notation from the previous section,

(note that *v _{j}* =

*i*implies

*τ*< ∞ but not vice versa). In general, this inequality is strict when

_{ij}*j*is at risk of infectious contact from any source other than

*i*. This inequality implies the contraction of generation and serial intervals during an epidemic. For background on the probability theory used in this section, please see Ref. [12] or any other probability text.

**Lemma 1** *E*[*τ*_{ij}|*v*_{j} = *i*] ≤ *E*[*τ*_{ij}|*τ*_{ij} < ∞]

_{ij}

_{j}

_{ij}

_{ij}

#### Proof

We first show that *E*[*τ _{ij}*|

*r*< ∞] ≤

_{i}, τ_{ij}*E*[

*τ*|

_{ij}*r*=

_{i}, v_{j}*i*] and then use the law of iterated expectation. If person

*i*was infected at time

*t*and has recovery period

_{i}*r*, then the probability that

_{i}*τ*< ∞ is

_{ij}*F*(∞|

_{ij}*r*) = 1 -

_{i}*S*(∞|

_{ij}*r*). Let

_{i}be the conditional cdf of *τ _{ij}* given

*r*and

_{i}*τ*< ∞. Then

_{ij}If person *j* is susceptible at time *t _{i}* and

*τ*< ∞, then

_{ij}*v*=

_{j}*i*if and only if

*j*escapes infectious contact from all other infectious people during the time interval (

*t*+

_{i}, t_{i}*τ*). Let

_{ij}*S*

_{*j}(

*t*+

_{i}*τ*) be the probability that

*j*escapes infectious contact from all sources other than

*i*in the interval (

*t*+

_{i}, t_{i}*τ*). Given

*r*and

_{i}*τ*< ∞, the conditional probability density for an infectious contact from

_{ij}*i*to

*j*at time

*t*+

_{i}*τ*that leads to the infection of

*j*is proportional to

If we let

then

Since *S*_{*j} (*t _{i}* +

*τ*) is a monotonically decreasing function of

_{ij}*τ*,

_{ij}Therefore,

Since the same inequality holds for all *r _{i}*,

by the law of iterated expectation.

Equality holds in equation (2) if and only if *τ _{ij}* and

*S*

_{*j}(

*t*+

_{i}*τ*) have covariance zero given

_{ij}*r*and

_{i}*τ*< ∞. Since

_{ij}*S*

_{*j}(

*t*+

_{i}*τ*) is a monotonically decreasing function of

_{ij}*τ*, this will occur if and only if

_{ij}*τ*or

_{ij}*S*

_{*j}(

*t*+

_{i}*τ*) is constant given

_{ij}*r*and

_{i}*τ*< ∞. Equality holds in equation (3) if and only if equality holds in (2) with probability one in

_{ij}*r*. If

_{i}*τ*is constant, then clearly

_{ij}*S*

_{*j}(

*t*+

_{i}*τ*) is constant and their covariance is zero. If

_{ij}*j*is not at risk of infectious contact from any source other than

*i*, then

*S*

_{*j}(

*t*+

_{i}*τ*) will be constant even when

_{ij}*τ*is not. In the thought experiment from the Introduction, the expected infection time of the susceptible

_{ij}*j*would remain constant in the following two scenarios: (i) all infectious persons make infectious contact with

*j*at a fixed time

*t*

_{0}, or (ii)

*j*is only at risk of infectious contact from a single person. Scenario (i) corresponds to a constant

*τ*and scenario (ii) corresponds to a constant

_{ij}*S*

_{*j}(

*t*+

_{i}*τ*).

_{ij}The expected generation interval from *i* to *j* given *v _{j}* =

*i*will be shortest when the risk of infectious contact to

*j*from sources other than

*i*is greatest. More specifically,

will be minimized when *S*_{*j}(*t _{i}* +

*τ*) decreases fastest in

_{ij}*τ*. In general, the risk of infectious contact from other sources will be greatest when the prevalence of infection is highest, so we expect the greatest contraction of the serial interval during an epidemic to coincide with the peak prevalence of infection.

_{ij}In general, we expect to see the following pattern over the course of an epidemic: The mean generation interval decreases as the prevalence of infection increases, reaches a minimum as the prevalence of infection peaks, and increases again as the prevalence of infection decreases.

### 3.1 Types of generation intervals

In [2], Svennson discussed two types of generation intervals that are consistent with the verbal definition given in the Introduction. *T _{p}* (

*p*for “primary”) denotes

*τ*where

_{ij}*i*is chosen at random from all persons who infect at least one other person and

*j*is chosen randomly from the set of persons

*i*infects.

*T*(

_{s}*s*for “secondary”) denotes

*τ*where

_{ij}*j*is chosen at random from all persons infected from within the population and

*i*=

*v*and

_{j}. T_{p}*T*differ only in the sampling procedure used to obtain the ordered pair

_{s}*ij*;

*T*samples primary cases (infectors) at random while

_{p}*T*samples secondary cases at random. Equation (3) implies that both

_{s}*E*[

*T*] and

_{p}*E*[

*T*] decrease when susceptible persons are at risk of infectious contact from multiple sources. This contraction occurs because the definitions of

_{s}*T*and

_{p}*T*include only

_{s}*τ*such that

_{ij}*i*actually infected

*j*.

### 3.2 Serial interval contraction

In an epidemic, infection times are generally unobserved. Instead, symptom onset times are observed. Recall that the time between the onset of symptoms in an infected person and the onset of symptoms in his or her infector is called the *serial interval*. Contraction of the mean generation interval implies contraction of the mean serial interval as well. The *incubation period* is the time from infection to the onset of symptoms [1]. Let *q _{i}* be the incubation period in person

*i*, and let ${t}_{i}^{\mathrm{sym}}={t}_{i}+{q}_{i}$ be the time of his or her onset of symptoms. If

*v*=

_{j}*i*, then the serial interval associated with person

*j*is

Therefore,

with strict inequality whenever strict inequality holds for the corresponding generation interval. Over the course of an epidemic, we expect the mean serial interval to follow a pattern very similar to that of the mean generation interval.

## 4 Simulations

We refer to the “race” to infect a susceptible person as *competition among potential infectors*. In this section, we illustrate two types of competition among potential infectors: *Global competition* among potential infectors results from a high global prevalence of infection. *Local competition* among potential infectors results from rapid transmission within clusters of contacts, which causes susceptibles to be at risk of infectious contact from multiple sources within their clusters even if the global prevalence of infection is low. In real epidemics, the prevalence of infection is usually low but there is clustering of contacts within households, hospital wards, schools, and other settings.

In this section, we use simulations to illustrate generation interval contraction under global and local competition among potential infectors. Each simulation is a single realization of a stochastic SIR model in a population of 10, 000. We keep track of the infection times of the primary and secondary case in each infector/infectee pair and the prevalence of infection at the infection time of the secondary case, which is a proxy for the amount of competition to infect the secondary case. We then calculate a smoothed mean of the generation interval as a function of the infection time of the primary case in each pair. Another valid approach would be to calculate the smoothed means from the results of many simulations. We did not take this approach for the following reasons: (i) Because of variation in the time course of different realizations of the same stochastic SIR model, many simulations would be required to obtain a curve that reliably approximates the asymptotic limit. (ii) The smoothed mean over many simulations would show a pattern similar to that obtained in any single simulation. (iii) Generation interval contraction was proven in Section 3, so the simulations are intended primarily as illustrations.

All simulations were implemented in Mathematica 5.0.0.0 [© 1988-2003 Wolfram Research, Inc.]. All data analysis was done using Intercooled Stata 9.2 [© 1985-2007 StataCorp LP] All smoothed means are running means with a bandwidth of 0.8 (the default for the Stata command lowess with the option mean). Similar results were obtained for larger and smaller bandwidths.

### 4.1 Global competition

To illustrate global competition among potential infectors, we use a fully-mixed model with population size *n* = 10, 000 and basic reproductive number *R*_{0}. The infectious period is fixed, with *r _{i}* = 1 with probability one for all

*i*. The infectious contact intervals

*τ*have an exponential distribution with hazard

_{ij}*R*

_{0}(

*n*- 1)

^{-1}truncated at

*r*, so

_{i}*S*(

_{ij}*τ*|

*r*) =

_{i}*e*

^{-R0(n-1)-1τ}when 0 <

*τ*< 1 and

*τ*= ∞ with probability

_{ij}*e*

^{-R0(n-1)-1}. The epidemic starts with a single imported infection and no other imported infections occur.

From equation (1), the mean infectious contact interval given that contact occurs is

For *n* = 10, 000, Table 1 shows this expected value at each *R*_{0}. For all *R*_{0}, *E*[*τ _{ij}*|

*τ*< ∞] ≈ .5.

_{ij}**...**

This model was run once at *R*_{0} = 1.25, 1.5, 2, 3, 4, 5, and 10. For each simulation, we recorded *t _{i}*,

*v*,

_{i}*t*, and the prevalence of infection at time

_{vi}*t*in each infector/infectee pair. Figure 2 shows smoothed mean curves for the generation interval versus the source infection time for

_{i}*R*

_{0}= 2, 3, 4, 5. There is a clear tendency for the mean generation interval to contract, with greater contraction at higher

*R*

_{0}. Figure 3 shows smoothed mean curves for the generation interval and the prevalence of infection versus the source infection time at each

*R*

_{0}; in each case, the greatest contraction of the serial interval coincides with the peak prevalence of infection (i.e., the greatest competition among potential infectors). Figure 4 shows the same curves for

*R*

_{0}= 1.25 and 1.50; in these cases, the generation interval stays relatively constant. These results are exactly in line with the argument of Section 3.

*R*

_{0}= 2, 3, 4, 5. There is a clear tendency to contract, with greater contraction for higher

*R*

_{0}.

*R*

_{0}= 2, 3, 4, 5. In all cases, the greatest contraction of the serial interval coincides with the peak prevalence of infection

**...**

### 4.2 Local competition

To illustrate local competition among potential infectors, we grouped a population of *n* = 9, 000 individuals into clusters of size *k*. As before, the infectious period is fixed at *r _{i}* = 1 for all

*i*. When

*i*and

*j*are in the same cluster, the infectious contact interval

*τ*has an exponential distribution with hazard λ

_{ij}_{within}truncated at

*r*, so

_{i}*S*(

_{ij}*τ*|

*r*) =

_{i}*e*

^{-λwithinτ}when 0 <

*τ*< 1 and

*τ*= ∞ with probability

_{ij}*e*

^{-λwithin}. When

*i*and

*j*are in different clusters,

*τ*has an exponential distribution with hazard λ

_{ij}_{between}truncated at

*r*, so

_{i}*S*(

_{ij}*τ*|

*r*) =

_{i}*e*

^{-λbetweenτ}when 0 < τ < 1 and

*τ*= ∞ with probability

_{ij}*e*

^{-λbetween}.

We fixed the hazard of infectious contact between individuals in the same cluster at λ_{within} = .4. We tuned the hazard of infectious contact between individuals in different clusters to obtain *R* mean infectious contacts by infectious individuals; specifically,

We chose λ_{within} = .4 to obtain rapid transmission within clusters while retaining sufficient transmission between clusters to sustain an epidemic. Note that when *k* > *R*(1 - *e*^{-.4})^{-1} + 1, we get the implausible result that λ_{between} < 0. Clearly, *R* and *k* must be chosen so that an infectious person makes an average of *R* or fewer infectious contacts within his or her cluster, which guarantees that λ_{between} ≥ 0.

At a given *R*, the mean infectious contact interval given that infectious contact occurs depends on the cluster size. If the entire population is infectious and the cluster size is *k*, then a given individual will receive an average of *R* infectious contacts, of which (*k*-1)(1-*e*^{-.4}) come from within his or her cluster. The mean infectious contact interval for within-cluster contacts is

and the mean infectious contact interval for between-cluster contacts is approximately .5 (as in the models for global competition). Therefore, the mean infectious contact interval given that contact occurs and the cluster size is *k* is

To compare generation interval contraction for different cluster sizes, we calculated *scaled generation intervals* by dividing the observed generation intervals at each cluster size by *E*[*τ _{ij}*|

*τ*< ∞,

_{ij}*k*]. If the mean generation interval remained constant, we would expect the mean scaled generation interval to be approximately one throughout an epidemic.

For *R* = 2, we ran the model with cluster sizes of 1 through 6. For *R* = 3, we ran the model with cluster sizes of 2 through 8. For each simulation, we recorded *t _{i}, v_{i}, t_{vi}*, and the prevalence of infection at time

*t*in each infector/infectee pair. Figure 5 shows smoothed mean curves for the generation interval and prevalence versus the source infection time for several cluster sizes at each

_{i}*R*. As before, there is a clear tendency of the mean generation interval to contract. The degree of contraction is roughly the same for all cluster sizes, but this contraction is maintained at a lower global prevalence of infection in models with larger cluster sizes. Similar results were obtained for cluster sizes not shown. Again, these results are exactly in line with the argument of Section 3.

## 5 Consequences for estimation

The effect of generation interval contraction on parameter estimates obtained from models that assume a constant generation or serial interval distribution is difficult to assess. The assumption of a constant serial or generation interval distribution may be reasonable in the early stages of an epidemic with little clustering of contacts, in an epidemic with *R*_{0} near one, or in an endemic situation. However, this ignores the more fundamental issue that estimates of these distributions are obtained from transmission events where the infector/infectee pairs are known (often because of transmission from a known patient within a household or hospital ward). Even in the early stages of an epidemic, the generation interval distribution in these settings may differ substantially from the generation interval distribution for transmission in the general population.

In this section, we argue that hazards of infectious contact can be used instead of generation or serial intervals in the analysis of epidemic data. As an example, we look at the estimator of *R*(*t*) (the effective reproductive number at time *t*) derived by Wallinga and Teunis [4] and applied to data on the SARS outbreaks in Hong Kong, Vietnam, Singapore, and Canada in 2003. In their paper, the available data was the “epidemic curve” **t** = (*t*_{(1)}, ..., *t*_{(m)}), where *t*_{(i)} is the infection time of the *i*^{th} person infected. They assume a probability density function (pdf) *w*(*τ*|*θ*) for the serial interval given a vector *θ* of parameters (note that this parameter vector applies to the population, not to individuals). The infector of person (*i*) is denoted by *v _{(i)}*, with

*v*= 0 for imported infections. The “infection network” is a vector

_{(i)}**v**= (

*v*

_{(1)}, ...,

*v*

_{(m)}) specifying the source of infection for each infected person. With these assumptions, the likelihood of v and

*θ*given

**t**is

The sum of this likelihood over the set *V* of all infection networks consistent with the epidemic curve **t** is

Taking a likelihood ratio, Wallinga and Teunis argue that the relative likelihood that person *k* was infected by person *j* is

The number *R _{j}* of secondary infectious generated by person

*j*is a sum of Bernoulli random variables with expectation

An estimate of the effective reproductive number *R*(*t*) can be obtained by calculating a smoothed mean for a scatterplot of (*t _{j}, E*[

*R*]). This analysis is ingenious, but it can be only approximately correct because the distribution of serial intervals varies systematically over the course of an epidemic.

_{j}### 5.1 Hazard-based estimator

A very similar result can be derived by applying the theory of order statistics (see Ref. [12]) to the general stochastic SIR model from Section 2. Specifically, we use the following results: If *X*_{1}, ..., *X _{n}* are independent non-negative random variables, then their minimum

*X*

_{(1)}has the hazard function

Given that the minimum is *x*_{(1)}, the probability that *X _{j}* =

*x*

_{(1)}(i.e. that the minimum was observed in the

*j*

^{th}random variable) is

For simplicity, we assume that the infectious contact intervals *τ _{ij}* are absolutely continuous random variables.

Let λ_{ij}(*τ*|*r _{i}*) be the conditional hazard function for

*τ*given

_{ij}*r*and let λ

_{i}_{0i}(

*t*) be the hazard function for infectious contact to

*i*from outside the population at time

*t*. Since

*τ*is nonnegative, λ

_{ij}_{ij}(

*τ*|

*r*) = 0 whenever

_{i}*τ*< 0. Let

*H*(

*t*) denote the set of infection times and recovery periods for all

*i*such that

*t*≤

_{i}*t*. If person

*k*is susceptible [summationtext] at time

*t*, his or her total hazard of infection at time

*t*given

*H*(

*t*) is ${\sum}_{i=0}^{n}{\lambda}_{ik}(t-{t}_{i}\mid {r}_{i})$, where we let λ

_{0k}(

*t*-

*t*

_{0}|

*r*

_{0}) = λ

_{0k}(

*t*) for simplicity of notation. If an infection occurs in person

*k*at time

*t*< ∞, then the conditional probability that person

_{k}*j*infected person

*k*given

*H*(

*t*) is

_{k}which is the probability that *t _{jk}* = min(

*t*

_{0k},

*t*

_{1k}, ...,

*t*). This has the same form as equation (4) except that it uses hazards of infectious contact instead of a pdf for the serial interval. If the hazards of infectious contact in the underlying SIR model do not change over the course of an epidemic, then

_{nk}*p*can be estimated accurately throughout an epidemic. Unlike the assumption of a stable generation or serial interval distribution, this assumption is unaffected by competition among potential infectors. The rest of the estimation of

_{jk}*R*(

*t*) could proceed exactly as in Ref. [4], replacing ${p}_{jk}^{\left(WT\right)}$ with

*p*.

_{jk}### 5.2 Partial likelihood for epidemic data

A partial likelihood for epidemic data can be derived using the same logic as that used to derive *p _{jk}* in equation (5). For each person

*k*such that

*t*< ∞, the probability that the failure at time

_{k}*t*occurred in person

_{k}*k*given

*H*(

*t*) is

_{k}where the numerator is the hazard of infection (from all sources) in person *k* at time *t _{k}* and the denominator is the total hazard of infection for all persons at risk of infection at time

*t*.

_{k}If there is a vector of parameters **x**_{ij} for each pair *ij* (which may include individual-level covariates for *i* and *j* as well as pairwise covariates for the ordered pair *ij*) and a vector of parameters *θ* such that λ_{ij}(*τ*|*r _{i}*) = λ(

*τ*|

*r*,

_{i},**x**_{ij}*θ*), then a partial likelihood for

*θ*can be obtained by multiplying equation (6) over all

*m*observed failure times. If (

*k*) denotes the index of the

*k*

^{th}person infected,

**t**= (

*t*

_{1}, ...,

*t*), and

_{n}**X**= {

**x**

*:*

_{ij}*i*,

*j*= 1, ...,

*n*}, then the partial likelihood is

This is very similar to partial likelihoods that arise in survival analysis, so many techniques from survival analysis may be adaptable for use in the analysis of epidemic data.

The goal of such methods would be to allow statistical inference about the effects of individual and pairwise covariates on the hazard of infection in ordered pairs of individuals. In the ordered pair *ij*, the effects of individual covariates for *i* and *j* on λ_{ij}(*τ*|*r _{i}*) would reflect the infectiousness of

*i*and the susceptibility of

*j*, respectively. Pairwise covariates could include such information as whether

*i*and

*j*are in the same household, the distance between their households, whether they are sexual partners, and any other aspects of their relationship to each other thay may affect the hazard of infection from

*i*to

*j*.

This approach has several advantages over any approach based on a distribution of generation or serial intervals. First, it is not necessary to determine who infected whom in any subset of observed infections. If *v _{j}* is known for some

*j*, this knowledge can be incorporated in the partial likelihood by replacing the term for the failure time of person

*j*in (7) with

*p*from equation (5). Second, this approach allows the use individual-level and pairwise covariates for inference in a flexible and intuitive way. The resulting estimated hazard functions have a straightforward interpretation and can be incorporated naturally into a stochastic SIR model. Third, this approach allows theory and methods from survival analysis to be applied to the analysis of epidemic data.

_{vjj}## 6 Discussion

Generation and serial interval distributions are not stable characteristics of an infectious disease. When multiple infectious persons compete to infect a given susceptible person, infection is caused by the first person to make infectious contact. In Section 3, we showed that the mean infectious contact interval *τ _{ij}* given that

*i*actually infected

*j*is less than or equal to the mean

*τ*given

_{ij}*i*made infectious contact with

*j*. That is,

with strict inequality when *τ _{ij}* is non-constant and

*j*is at risk of infectious contact from any source other than

*i*(more precise conditions are given in Section 3). This result holds for all time-homogeneous stochastic SIR models.

In an epidemic, the mean generation (and serial) intervals contract as the prevalence of infection increases and susceptible persons are at risk of infectious contact from multiple sources. In the simulations of Section 4, we saw that the degree of contraction increases with *R*_{0}. For models with clustering of contacts, generation interval contraction can occur even when the global prevalence of infection is low because susceptibles are at risk of infectious contact from multiple sources within their own clusters. In all of the simulations, the greatest serial interval contraction coincided with the peak prevalence of infection, when the risk of infectious contacts from multiple sources was highest. The mean generation interval increases again as the epidemic wanes, but this rebound may be small when *R*_{0} is high.

The reason that generation and serial intervals contract during an epidemic is that their definition applies to pairs of individuals *ij* such that *i* actually transmitted infection to *j*. If we don’t require that an infectious contact leads to the transmission of infection, we are led naturally to the concept of the infectious contact interval, which has a well-defined distribution throughout an epidemic. Similarly, we can define *R*_{0} as the mean number of infectious contacts (i.e., finite infectious contact intervals) made by a primary case without reference to a completely susceptible population. Generation and serial intervals and the effective reproductive number can then be defined in terms of infectious contacts that actually lead to the transmission of infection. Many fundamental concepts in infectious disease epidemiology can be simplified usefully by defining them in terms of infectious contact rather than infection transmission.

Infectious contact hazards for ordered pairs of individuals can be used for many of the same types of analysis that have been attempted using generation or serial interval distributions. In Section 5, We derived a hazard-based estimator of *R*(*t*) very similar to that developed by Wallinga and Teunis [4]. This derivation led naturally to a partial likelihood for epidemic data very similar to those that arise in survival analysis. We believe that the adaptation of methods and theory from survival analysis to infectious disease epidemiology will yield flexible and powerful tools for epidemic data analysis.

## Acknowledgements

This work was supported by the US National Institutes of Health cooperative agreement 5U01GM076497 “Models of Infectious Disease Agent Study” (E.K. and M.L.) and Ruth L. Kirchstein National Research Service Award 5T32AI007535 “Epidemiology of Infectious Diseases and Biodefense” (E.K.). We also wish to thank Jacco Wallinga and the anonymous reviewers of Mathematical Biosciences for useful comments and suggestions.

## References

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (669K)

- Network-based analysis of stochastic SIR epidemic models with random and proportionate mixing.[J Theor Biol. 2007]
*Kenah E, Robins JM.**J Theor Biol. 2007 Dec 21; 249(4):706-22. Epub 2007 Sep 15.* - The effect of patterns of infectiousness on epidemic size.[Math Biosci Eng. 2008]
*Gordillo LF, Marion SA, Greenwood PE.**Math Biosci Eng. 2008 Jul; 5(3):429-35.* - Deterministic epidemic models with explicit household structure.[Math Biosci. 2008]
*House T, Keeling MJ.**Math Biosci. 2008 May; 213(1):29-39. Epub 2008 Feb 26.* - The estimation of the basic reproduction number for infectious diseases.[Stat Methods Med Res. 1993]
*Dietz K.**Stat Methods Med Res. 1993; 2(1):23-41.* - Martingale methods for the analysis of epidemic data.[Stat Methods Med Res. 1993]
*Becker NG.**Stat Methods Med Res. 1993; 2(1):93-112.*

- To close or not to close? Analysis of 4 year's data from national surveillance of norovirus outbreaks in hospitals in England[BMJ Open. ]
*Harris JP, Adak GK, O'Brien SJ.**BMJ Open. 4(1)e003919* - An IDEA for Short Term Outbreak Projection: Nearcasting Using the Basic Reproduction Number[PLoS ONE. ]
*Fisman DN, Hauck TS, Tuite AR, Greer AL.**PLoS ONE. 8(12)e83622* - Does spatial proximity drive norovirus transmission during outbreaks in hospitals?[BMJ Open. ]
*Harris JP, Lopman BA, Cooper BS, O'Brien SJ.**BMJ Open. 3(7)e003060* - Nonparametric survival analysis of infectious disease data[Journal of the Royal Statistical Society. S...]
*Kenah E.**Journal of the Royal Statistical Society. Series B, Statistical methodology. 2013 Mar; 75(2)277-303* - Modelling tree shape and structure in viral phylodynamics[Philosophical Transactions of the Royal Soc...]
*Frost SD, Volz EM.**Philosophical Transactions of the Royal Society B: Biological Sciences. 2013 Mar 19; 368(1614)20120208*

- Cited in BooksCited in BooksPubMed Central articles cited in books
- MedGenMedGenRelated information in MedGen
- PubMedPubMedPubMed citations for these articles

- Generation interval contraction and epidemic data analysisGeneration interval contraction and epidemic data analysisNIHPA Author Manuscripts. May 2008; 213(1)71PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...