• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Math Biosci. Author manuscript; available in PMC May 1, 2009.
Published in final edited form as:
Published online Feb 29, 2008.
PMCID: PMC2365921
NIHMSID: NIHMS45802

# Generation interval contraction and epidemic data analysis

## Abstract

The generation interval is the time between the infection time of an infected person and the infection time of his or her infector. Probability density functions for generation intervals have been an important input for epidemic models and epidemic data analysis. In this paper, we specify a general stochastic SIR epidemic model and prove that the mean generation interval decreases when susceptible persons are at risk of infectious contact from multiple sources. The intuition behind this is that when a susceptible person has multiple potential infectors, there is a “race” to infect him or her in which only the first infectious contact leads to infection. In an epidemic, the mean generation interval contracts as the prevalence of infection increases. We call this global competition among potential infectors. When there is rapid transmission within clusters of contacts, generation interval contraction can be caused by a high local prevalence of infection even when the global prevalence is low. We call this local competition among potential infectors. Using simulations, we illustrate both types of competition. Finally, we show that hazards of infectious contact can be used instead of generation intervals to estimate the time course of the effective reproductive number in an epidemic. This approach leads naturally to partial likelihoods for epidemic data that are very similar to those that arise in survival analysis, opening a promising avenue of methodological research in infectious disease epidemiology.

## 1 Introduction

In infectious disease epidemiology, the serial interval is the difference between the symptom onset time of an infected person and the symptom onset time of his or her infector [1]. This is sometimes called the “generation interval.” However, we find it more useful to adopt the terminology of Svensson [2] and define the generation interval as the difference between the infection time of an infected person and the infection time of his or her infector. By these definitions, the serial interval is observable while the generation interval usually is not. We define infectious contact from i to j to be a contact that is sufficient to infect j if i is infectious and j is susceptible, and we define a potential infector of person i to be an infectious person who has positive probability of making infectious contact with i. Finally, we use the term hazard rather than force of infection to highlight the similarities between epidemic data analysis and survival analysis.

The generation interval has been an important input for epidemic models used to investigate the transmission and control of SARS [3, 4] and pandemic influenza [5,6]. More recently, generation interval distributions have been used to calculate the incubation period distribution of SARS [7] and to estimate R0 from the exponential growth rate at the beginning of an epidemic [8]. It is generally assumed that the generation interval distribution is characteristic of an infectious disease. In this paper, we show that this is not true. Instead, the expected generation interval decreases as the number of potential infectors of susceptibles increases. During an epidemic, generation intervals tend to contract as the prevalence of infection increases. This effect was described by Svensson [2] for an SIR model with homogeneous mixing. In this paper, we extend this result to all time-homogeneous stochastic SIR models.

A simple thought experiment illustrates the intuition behind our main result. Imagine a susceptible person j in a room. Place m other persons in the room and infect them all at time t = 0. For simplicity, assume that infectious contact from i to j occurs with probability one, i = 1, ..., m. Let tij be a continuous nonnegative random variable denoting the first time at which i makes infectious contact with j. Person j is infected at time tj = min(t1j, ..., tmj). Since all infectious persons were infected at time zero, tj is the generation interval. If we repeat the experiment with larger and larger m, the expected value of min(t1j, ..., tmj) will decrease.

When a susceptible person is at risk of infectious contact from multiple sources, there is a “race” to infect him or her in which only the first infectious contact leads to infection. Generation interval contraction is an example of a well-known phenomenon in epidemiology: The expected time to an outcome, given that the outcome occurs, decreases in the presence of competing risks. In our thought experiment, the outcome is the infection of j by a given i and the competing risks are infectious contacts from all sources other than i.

Adapting our thought experiment slightly, we see that the contraction of the generation interval is a consequence of the fact that the hazard of infection for j increases as the number of potential infectors increases. Let λ(t) be the hazard of infectious contact from any potential infector to j at time t and let E[tj|m] be the expected infection time of j given m potential infectors. Then

$E[tj∣m]=∫0∞e−mλ(t)dt>∫0∞e−(m+1)λ(t)dt=E[tj∣m+1],$

so the expected generation interval decreases as the number of potential infectors increases. A hazard of infection that increases with the number of potential infectors is a defining feature of most epidemic models, so generation interval contraction is a very general phenomenon. We note that a very similar phenomenon occurs in endemic diseases, where increased force of infection results in a decreased average age at first infection [9].

The rest of the paper is organized as follows: In Section 2, we describe a general stochastic SIR epidemic model. In Section 3, we use this model to show that the mean generation interval decreases as the number of potential infectors increases. As a corollary, we find that the mean serial interval also decreases. In Section 4, we consider the role of the population contact structure in generation interval contraction and illustrate the effects of global and local competition among potential infectors with simulations. In Section 5, we argue that hazards of infectious contact should be used instead of generation or serial interval distributions in the analysis of epidemic data. Section 6 summarizes our main results and conclusions.

## 2 General stochastic SIR model

We start with a very general stochastic “Susceptible-Infectious-Removed” (SIR) epidemic model. This model includes fully-mixed and network-based models as special cases, and it has been used previously to define a mapping from the final outcomes of stochastic SIR models to the components of semi-directed random networks [10, 11].

Each person i is infected at his or her infection time ti, with ti = ∞ if i is never infected. Person i recovers from infectiousness or dies at time ti + ri, where the recovery period ri is a positive random variable with the cumulative distribution function (cdf) Fi(r). The recovery period ri may be the sum of a latent period, during which i is infected but not infectious, and an infectious period, during which i can transmit infection. We assume that all infected persons have a finite recovery period. If person i is never infected, let ri = ∞. Let Sus(t) = {i : ti > t} be the set of susceptibles at time t.

When person i is infected, he or she makes infectious contact with person j after an infectious contact interval τij. Each τij is a positive random variable with cdf Fij(τ|ri) and survival function Sij(τ|ri) = 1 - Fij(τ|ri). Let τij = ∞ if person i never makes infectious contact with person j, so the infectious contact interval distribution may have probability mass at ∞. Define

$Sij(∞∣ri)=limτ→∞Sij(τ∣ri),$

which is the conditional probability that i never makes infectious contact with j given ri. Since a person cannot transmit disease before being infected or after recovering from infectiousness, Sij(τ|ri) = 1 for all τ ≤ 0 and Sij(τ|ri) = Sij(∞|ri) for all τri. Since a person cannot infect himself (or herself), τii = ∞ with probability one and Sii(τ|ri) = 1 for all τ.

The infectious contact time tij = ti + τij is the time at which person i makes infectious contact with person j. If person j is susceptible at time tij, then i infects j and tj = tij. If tij < ∞, then tjtij because person j avoids infection at time tij only if he or she has already been infected. If person i never makes infectious contact with person j, then tij = ∞ because τij = ∞. Figure 1 shows a schematic diagram of the relationships among ri, τij, and tij.

Schematic diagram of variables in the general stochastic SIR model for the ordered pair ij. Recall that tjtij. As discussed in Section 3.2, person i develops symptoms at time $tisym=ti+qi$, where qi is the incubation period.

The importation time t0i of person i is the earliest time at which he or she receives infectious contact from outside the population. The importation time vector t0 = (t01, ..., t0n).

We assume that each infected person has a unique infector. Following [4], we let vi represent the index of the person who infected person i, with vi = 0 for imported infections and vi = ∞ if i is never infected. If tied infectious contact times have nonzero probability, then vi can be chosen from all j such that tji = ti < ∞.

### 2.1 Epidemics

Let t(1)t(2) ≤ ... ≤ t(m) be the order statistics of all t1, ..., tn less than infinity, and let (k) be the index of the kth person infected. Before the epidemic begins, an importation time vector t0 is chosen. The epidemic begins at time t(1) = mini(t0i). Person (1) is assigned a recovery time r(1). Every person j Sus(t(1)) is assigned an infectious contact time t(1)j = t(1) + τ(1)j. The second infection occurs at t(2) = minjSus(t(1)) min(t0j, t(1)j), which is the first infectious contact time after t(1). Person (2) is assigned a infectious period r(2). After k infections, the next infection occurs at t(k+1) = minjSus(t(k)) min(t0j, t(1)j, ..., t(k)j). The epidemic stops after m infections if and only if t(m+1) = ∞.

## 3 Generation interval contraction

In this section, we show that the mean infectious contact interval τij given that i infects j is shorter than the mean infectious contact interval given that i makes infectious contact with j. In the notation from the previous section,

$E[τij∣vj=i]≤E[τij∣τij<∞]$

(note that vj = i implies τij < ∞ but not vice versa). In general, this inequality is strict when j is at risk of infectious contact from any source other than i. This inequality implies the contraction of generation and serial intervals during an epidemic. For background on the probability theory used in this section, please see Ref. [12] or any other probability text.

### Lemma 1E[τij|vj = i] ≤ E[τij|τij < ∞]

#### Proof

We first show that E[τij|ri, τij < ∞] ≤ E[τij|ri, vj = i] and then use the law of iterated expectation. If person i was infected at time ti and has recovery period ri, then the probability that τij < ∞ is Fij(∞|ri) = 1 - Sij(∞|ri). Let

$Fij∗(τ∣ri)=Fij(τ∣ri)Fij(∞∣ri)$

be the conditional cdf of τij given ri and τij < ∞. Then

$E[τij∣ri,τij<∞]=∫0riτdFij∗(τ∣ri).$
(1)

If person j is susceptible at time ti and τij < ∞, then vj = i if and only if j escapes infectious contact from all other infectious people during the time interval (ti, ti + τij). Let S*j(ti + τ) be the probability that j escapes infectious contact from all sources other than i in the interval (ti, ti + τ). Given ri and τij < ∞, the conditional probability density for an infectious contact from i to j at time ti + τ that leads to the infection of j is proportional to

$S∗j(ti+τ)dFij∗(τ∣ri).$

If we let

$ψ=∫0riS∗j(ti+τ)dFij∗(τ∣ri),$

then

$E[τij∣ri,vj=i]=∫0riτS∗j(ti+τ)ψdFij∗(τ∣ri).$

Since S*j (ti + τij) is a monotonically decreasing function of τij,

$E[τij∣ri,vj=i]−E[τij∣ri,τij<∞]=E[τijS∗j(ti+τij)ψ∣ri,τij<∞]−E[τij∣ri,τij<∞]E[S∗j(ti+τij)ψ∣ri,τij<∞]=Cov(τij,S∗j(ti+τij)ψ∣ri,τij<∞)≤0.$

Therefore,

$E[τij∣ri,vj=i]≤E[τij∣ri,τij<∞].$
(2)

Since the same inequality holds for all ri,

$E[τij∣vj=i]=E[E[τij∣ri,vj=i]]≤E[E[τij∣ri,τij<∞]]=E[τij∣τij<∞]$
(3)

by the law of iterated expectation.

Equality holds in equation (2) if and only if τij and S*j(ti + τij) have covariance zero given ri and τij < ∞. Since S*j(ti + τij) is a monotonically decreasing function of τij, this will occur if and only if τij or S*j(ti + τij) is constant given ri and τij < ∞. Equality holds in equation (3) if and only if equality holds in (2) with probability one in ri. If τij is constant, then clearly S*j(ti + τij) is constant and their covariance is zero. If j is not at risk of infectious contact from any source other than i, then S*j(ti + τij) will be constant even when τij is not. In the thought experiment from the Introduction, the expected infection time of the susceptible j would remain constant in the following two scenarios: (i) all infectious persons make infectious contact with j at a fixed time t0, or (ii) j is only at risk of infectious contact from a single person. Scenario (i) corresponds to a constant τij and scenario (ii) corresponds to a constant S*j(ti + τij).

The expected generation interval from i to j given vj = i will be shortest when the risk of infectious contact to j from sources other than i is greatest. More specifically,

$E[τij∣ri,vj=i]−E[τij∣ri,τij<∞]$

will be minimized when S*j(ti + τij) decreases fastest in τij. In general, the risk of infectious contact from other sources will be greatest when the prevalence of infection is highest, so we expect the greatest contraction of the serial interval during an epidemic to coincide with the peak prevalence of infection.

In general, we expect to see the following pattern over the course of an epidemic: The mean generation interval decreases as the prevalence of infection increases, reaches a minimum as the prevalence of infection peaks, and increases again as the prevalence of infection decreases.

### 3.1 Types of generation intervals

In [2], Svennson discussed two types of generation intervals that are consistent with the verbal definition given in the Introduction. Tp (p for “primary”) denotes τij where i is chosen at random from all persons who infect at least one other person and j is chosen randomly from the set of persons i infects. Ts (s for “secondary”) denotes τij where j is chosen at random from all persons infected from within the population and i = vj. Tp and Ts differ only in the sampling procedure used to obtain the ordered pair ij; Tp samples primary cases (infectors) at random while Ts samples secondary cases at random. Equation (3) implies that both E[Tp] and E[Ts] decrease when susceptible persons are at risk of infectious contact from multiple sources. This contraction occurs because the definitions of Tp and Ts include only τij such that i actually infected j.

### 3.2 Serial interval contraction

In an epidemic, infection times are generally unobserved. Instead, symptom onset times are observed. Recall that the time between the onset of symptoms in an infected person and the onset of symptoms in his or her infector is called the serial interval. Contraction of the mean generation interval implies contraction of the mean serial interval as well. The incubation period is the time from infection to the onset of symptoms [1]. Let qi be the incubation period in person i, and let $tisym=ti+qi$ be the time of his or her onset of symptoms. If vj = i, then the serial interval associated with person j is

$tjsym−tisym=τij+qj−qi.$

Therefore,

$E[tjsym−tisym∣vj=i]=E[τij∣vj=i]+E[qj]−E[qi]≤E[τij∣τij<∞]+E[qj]−E[qi],$

with strict inequality whenever strict inequality holds for the corresponding generation interval. Over the course of an epidemic, we expect the mean serial interval to follow a pattern very similar to that of the mean generation interval.

## 4 Simulations

We refer to the “race” to infect a susceptible person as competition among potential infectors. In this section, we illustrate two types of competition among potential infectors: Global competition among potential infectors results from a high global prevalence of infection. Local competition among potential infectors results from rapid transmission within clusters of contacts, which causes susceptibles to be at risk of infectious contact from multiple sources within their clusters even if the global prevalence of infection is low. In real epidemics, the prevalence of infection is usually low but there is clustering of contacts within households, hospital wards, schools, and other settings.

In this section, we use simulations to illustrate generation interval contraction under global and local competition among potential infectors. Each simulation is a single realization of a stochastic SIR model in a population of 10, 000. We keep track of the infection times of the primary and secondary case in each infector/infectee pair and the prevalence of infection at the infection time of the secondary case, which is a proxy for the amount of competition to infect the secondary case. We then calculate a smoothed mean of the generation interval as a function of the infection time of the primary case in each pair. Another valid approach would be to calculate the smoothed means from the results of many simulations. We did not take this approach for the following reasons: (i) Because of variation in the time course of different realizations of the same stochastic SIR model, many simulations would be required to obtain a curve that reliably approximates the asymptotic limit. (ii) The smoothed mean over many simulations would show a pattern similar to that obtained in any single simulation. (iii) Generation interval contraction was proven in Section 3, so the simulations are intended primarily as illustrations.

All simulations were implemented in Mathematica 5.0.0.0 [© 1988-2003 Wolfram Research, Inc.]. All data analysis was done using Intercooled Stata 9.2 [© 1985-2007 StataCorp LP] All smoothed means are running means with a bandwidth of 0.8 (the default for the Stata command lowess with the option mean). Similar results were obtained for larger and smaller bandwidths.

### 4.1 Global competition

To illustrate global competition among potential infectors, we use a fully-mixed model with population size n = 10, 000 and basic reproductive number R0. The infectious period is fixed, with ri = 1 with probability one for all i. The infectious contact intervals τij have an exponential distribution with hazard R0(n - 1)-1 truncated at ri, so Sij(τ|ri) = e-R0(n-1)-1τ when 0 < τ < 1 and τij = ∞ with probability e-R0(n-1)-1. The epidemic starts with a single imported infection and no other imported infections occur.

From equation (1), the mean infectious contact interval given that contact occurs is

$E[τij∣τij<∞]=∫01e−R0τ(n−1)−1−e−R0(n−1)−11−e−R0(n−1)−1dτ$

For n = 10, 000, Table 1 shows this expected value at each R0. For all R0, E[τij|τij < ∞] ≈ .5.

Expected infectious contact interval given that infectious contact occurs in the models illustrating global competition among potential infectors. If the generation interval were constant, this would be the mean generation interval throughout an epidemic ...

This model was run once at R0 = 1.25, 1.5, 2, 3, 4, 5, and 10. For each simulation, we recorded ti, vi, tvi, and the prevalence of infection at time ti in each infector/infectee pair. Figure 2 shows smoothed mean curves for the generation interval versus the source infection time for R0 = 2, 3, 4, 5. There is a clear tendency for the mean generation interval to contract, with greater contraction at higher R0. Figure 3 shows smoothed mean curves for the generation interval and the prevalence of infection versus the source infection time at each R0; in each case, the greatest contraction of the serial interval coincides with the peak prevalence of infection (i.e., the greatest competition among potential infectors). Figure 4 shows the same curves for R0 = 1.25 and 1.50; in these cases, the generation interval stays relatively constant. These results are exactly in line with the argument of Section 3.

The smoothed mean generation interval as a function the source infection time for R0 = 2, 3, 4, 5. There is a clear tendency to contract, with greater contraction for higher R0.
The smoothed mean generation interval (solid lines) and prevalence (dotted lines) as a function of the source infection time for R0 = 2, 3, 4, 5. In all cases, the greatest contraction of the serial interval coincides with the peak prevalence of infection ...
The smoothed mean generation intervals (solid lines) and prevalence (dotted lines) as a function of the source infection time for R0 = 1.25 and 1.50. For R0 near one, the mean generation interval stays relatively constant.

### 4.2 Local competition

To illustrate local competition among potential infectors, we grouped a population of n = 9, 000 individuals into clusters of size k. As before, the infectious period is fixed at ri = 1 for all i. When i and j are in the same cluster, the infectious contact interval τij has an exponential distribution with hazard λwithin truncated at ri, so Sij(τ|ri) = ewithinτ when 0 < τ < 1 and τij = ∞ with probability ewithin. When i and j are in different clusters, τij has an exponential distribution with hazard λbetween truncated at ri, so Sij(τ|ri) = ebetweenτ when 0 < τ < 1 and τij = ∞ with probability ebetween.

We fixed the hazard of infectious contact between individuals in the same cluster at λwithin = .4. We tuned the hazard of infectious contact between individuals in different clusters to obtain R mean infectious contacts by infectious individuals; specifically,

$λbetween=R−(k−1)(1−e−.4)n−k.$

We chose λwithin = .4 to obtain rapid transmission within clusters while retaining sufficient transmission between clusters to sustain an epidemic. Note that when k > R(1 - e-.4)-1 + 1, we get the implausible result that λbetween < 0. Clearly, R and k must be chosen so that an infectious person makes an average of R or fewer infectious contacts within his or her cluster, which guarantees that λbetween ≥ 0.

At a given R, the mean infectious contact interval given that infectious contact occurs depends on the cluster size. If the entire population is infectious and the cluster size is k, then a given individual will receive an average of R infectious contacts, of which (k-1)(1-e-.4) come from within his or her cluster. The mean infectious contact interval for within-cluster contacts is

$11−e−.4∫01.4τe−4.τdτ,$

and the mean infectious contact interval for between-cluster contacts is approximately .5 (as in the models for global competition). Therefore, the mean infectious contact interval given that contact occurs and the cluster size is k is

$E[τij∣τij<∞,k]≈(1−(k−1)(1−e−.4)R).5+(k−1)R∫01.4τe−.4τdτ.$

To compare generation interval contraction for different cluster sizes, we calculated scaled generation intervals by dividing the observed generation intervals at each cluster size by E[τij|τij < ∞, k]. If the mean generation interval remained constant, we would expect the mean scaled generation interval to be approximately one throughout an epidemic.

For R = 2, we ran the model with cluster sizes of 1 through 6. For R = 3, we ran the model with cluster sizes of 2 through 8. For each simulation, we recorded ti, vi, tvi, and the prevalence of infection at time ti in each infector/infectee pair. Figure 5 shows smoothed mean curves for the generation interval and prevalence versus the source infection time for several cluster sizes at each R. As before, there is a clear tendency of the mean generation interval to contract. The degree of contraction is roughly the same for all cluster sizes, but this contraction is maintained at a lower global prevalence of infection in models with larger cluster sizes. Similar results were obtained for cluster sizes not shown. Again, these results are exactly in line with the argument of Section 3.

The smoothed mean scaled generation interval (SGI) and prevalence as a function of the source infection time for R = 2 and R = 3. With increasing cluster size, the degree of generation interval contraction is roughly the same even though the peak prevalence ...

## 5 Consequences for estimation

The effect of generation interval contraction on parameter estimates obtained from models that assume a constant generation or serial interval distribution is difficult to assess. The assumption of a constant serial or generation interval distribution may be reasonable in the early stages of an epidemic with little clustering of contacts, in an epidemic with R0 near one, or in an endemic situation. However, this ignores the more fundamental issue that estimates of these distributions are obtained from transmission events where the infector/infectee pairs are known (often because of transmission from a known patient within a household or hospital ward). Even in the early stages of an epidemic, the generation interval distribution in these settings may differ substantially from the generation interval distribution for transmission in the general population.

In this section, we argue that hazards of infectious contact can be used instead of generation or serial intervals in the analysis of epidemic data. As an example, we look at the estimator of R(t) (the effective reproductive number at time t) derived by Wallinga and Teunis [4] and applied to data on the SARS outbreaks in Hong Kong, Vietnam, Singapore, and Canada in 2003. In their paper, the available data was the “epidemic curve” t = (t(1), ..., t(m)), where t(i) is the infection time of the ith person infected. They assume a probability density function (pdf) w(τ|θ) for the serial interval given a vector θ of parameters (note that this parameter vector applies to the population, not to individuals). The infector of person (i) is denoted by v(i), with v(i) = 0 for imported infections. The “infection network” is a vector v = (v(1), ..., v(m)) specifying the source of infection for each infected person. With these assumptions, the likelihood of v and θ given t is

$L(v,θ∣t)=∏i:v(i)≠0ω(t(i)−tv(i)∣θ).$

The sum of this likelihood over the set V of all infection networks consistent with the epidemic curve t is

$L(θ∣t)=∏i:v(i)≠0∑j≠1ω(ti−tj∣θ).$

Taking a likelihood ratio, Wallinga and Teunis argue that the relative likelihood that person k was infected by person j is

$pjk(WT)=ω(tk−tj∣θ)∑i≠kω(tk−ti∣θ).$
(4)

The number Rj of secondary infectious generated by person j is a sum of Bernoulli random variables with expectation

$E[Rj]=∑k=1npjk(WT).$

An estimate of the effective reproductive number R(t) can be obtained by calculating a smoothed mean for a scatterplot of (tj, E[Rj]). This analysis is ingenious, but it can be only approximately correct because the distribution of serial intervals varies systematically over the course of an epidemic.

### 5.1 Hazard-based estimator

A very similar result can be derived by applying the theory of order statistics (see Ref. [12]) to the general stochastic SIR model from Section 2. Specifically, we use the following results: If X1, ..., Xn are independent non-negative random variables, then their minimum X(1) has the hazard function

$λ(1)(t)=∑i=1nλi(t).$

Given that the minimum is x(1), the probability that Xj = x(1) (i.e. that the minimum was observed in the jth random variable) is

$λj(x(1))∑i=1nλi(x(1)).$

For simplicity, we assume that the infectious contact intervals τij are absolutely continuous random variables.

Let λij(τ|ri) be the conditional hazard function for τij given ri and let λ0i(t) be the hazard function for infectious contact to i from outside the population at time t. Since τij is nonnegative, λij(τ|ri) = 0 whenever τ < 0. Let H(t) denote the set of infection times and recovery periods for all i such that tit. If person k is susceptible [summationtext] at time t, his or her total hazard of infection at time t given H(t) is $∑i=0nλik(t−ti∣ri)$, where we let λ0k(t - t0|r0) = λ0k(t) for simplicity of notation. If an infection occurs in person k at time tk < ∞, then the conditional probability that person j infected person k given H(tk) is

$pjk=λjk(tk−tj∣rj)∑i=0nλik(tk−ti∣ri),$
(5)

which is the probability that tjk = min(t0k, t1k, ..., tnk). This has the same form as equation (4) except that it uses hazards of infectious contact instead of a pdf for the serial interval. If the hazards of infectious contact in the underlying SIR model do not change over the course of an epidemic, then pjk can be estimated accurately throughout an epidemic. Unlike the assumption of a stable generation or serial interval distribution, this assumption is unaffected by competition among potential infectors. The rest of the estimation of R(t) could proceed exactly as in Ref. [4], replacing $pjk(WT)$ with pjk.

### 5.2 Partial likelihood for epidemic data

A partial likelihood for epidemic data can be derived using the same logic as that used to derive pjk in equation (5). For each person k such that tk < ∞, the probability that the failure at time tk occurred in person k given H(tk) is

$∑i=0nλik(tk−ti∣ri)∑j=1n∑i=0nλij(tk−ti∣ri),$
(6)

where the numerator is the hazard of infection (from all sources) in person k at time tk and the denominator is the total hazard of infection for all persons at risk of infection at time tk.

If there is a vector of parameters xij for each pair ij (which may include individual-level covariates for i and j as well as pairwise covariates for the ordered pair ij) and a vector of parameters θ such that λij(τ|ri) = λ(τ|ri, xij, θ), then a partial likelihood for θ can be obtained by multiplying equation (6) over all m observed failure times. If (k) denotes the index of the kth person infected, t = (t1, ..., tn), and X = {xij : i, j = 1, ..., n}, then the partial likelihood is

$Lp(θ∣t,X)=∏k=1m∑i=0nλ(t(k)−ti∣ri,xi(k),θ)∑j=1n∑i=0nλ(tk−ti∣ri,xij,θ).$
(7)

This is very similar to partial likelihoods that arise in survival analysis, so many techniques from survival analysis may be adaptable for use in the analysis of epidemic data.

The goal of such methods would be to allow statistical inference about the effects of individual and pairwise covariates on the hazard of infection in ordered pairs of individuals. In the ordered pair ij, the effects of individual covariates for i and j on λij(τ|ri) would reflect the infectiousness of i and the susceptibility of j, respectively. Pairwise covariates could include such information as whether i and j are in the same household, the distance between their households, whether they are sexual partners, and any other aspects of their relationship to each other thay may affect the hazard of infection from i to j.

This approach has several advantages over any approach based on a distribution of generation or serial intervals. First, it is not necessary to determine who infected whom in any subset of observed infections. If vj is known for some j, this knowledge can be incorporated in the partial likelihood by replacing the term for the failure time of person j in (7) with pvjj from equation (5). Second, this approach allows the use individual-level and pairwise covariates for inference in a flexible and intuitive way. The resulting estimated hazard functions have a straightforward interpretation and can be incorporated naturally into a stochastic SIR model. Third, this approach allows theory and methods from survival analysis to be applied to the analysis of epidemic data.

## 6 Discussion

Generation and serial interval distributions are not stable characteristics of an infectious disease. When multiple infectious persons compete to infect a given susceptible person, infection is caused by the first person to make infectious contact. In Section 3, we showed that the mean infectious contact interval τij given that i actually infected j is less than or equal to the mean τij given i made infectious contact with j. That is,

$E[τij∣vj=i]≤E[τij∣τij<∞],$

with strict inequality when τij is non-constant and j is at risk of infectious contact from any source other than i (more precise conditions are given in Section 3). This result holds for all time-homogeneous stochastic SIR models.

In an epidemic, the mean generation (and serial) intervals contract as the prevalence of infection increases and susceptible persons are at risk of infectious contact from multiple sources. In the simulations of Section 4, we saw that the degree of contraction increases with R0. For models with clustering of contacts, generation interval contraction can occur even when the global prevalence of infection is low because susceptibles are at risk of infectious contact from multiple sources within their own clusters. In all of the simulations, the greatest serial interval contraction coincided with the peak prevalence of infection, when the risk of infectious contacts from multiple sources was highest. The mean generation interval increases again as the epidemic wanes, but this rebound may be small when R0 is high.

The reason that generation and serial intervals contract during an epidemic is that their definition applies to pairs of individuals ij such that i actually transmitted infection to j. If we don’t require that an infectious contact leads to the transmission of infection, we are led naturally to the concept of the infectious contact interval, which has a well-defined distribution throughout an epidemic. Similarly, we can define R0 as the mean number of infectious contacts (i.e., finite infectious contact intervals) made by a primary case without reference to a completely susceptible population. Generation and serial intervals and the effective reproductive number can then be defined in terms of infectious contacts that actually lead to the transmission of infection. Many fundamental concepts in infectious disease epidemiology can be simplified usefully by defining them in terms of infectious contact rather than infection transmission.

Infectious contact hazards for ordered pairs of individuals can be used for many of the same types of analysis that have been attempted using generation or serial interval distributions. In Section 5, We derived a hazard-based estimator of R(t) very similar to that developed by Wallinga and Teunis [4]. This derivation led naturally to a partial likelihood for epidemic data very similar to those that arise in survival analysis. We believe that the adaptation of methods and theory from survival analysis to infectious disease epidemiology will yield flexible and powerful tools for epidemic data analysis.

## Acknowledgements

This work was supported by the US National Institutes of Health cooperative agreement 5U01GM076497 “Models of Infectious Disease Agent Study” (E.K. and M.L.) and Ruth L. Kirchstein National Research Service Award 5T32AI007535 “Epidemiology of Infectious Diseases and Biodefense” (E.K.). We also wish to thank Jacco Wallinga and the anonymous reviewers of Mathematical Biosciences for useful comments and suggestions.

## References

[1] Giesecke J. Modern Infectious Disease Epidemiology. Edward Arnold; London: 1994.
[2] Svensson Å. A note on generation times in epidemic models. Mathematical Biosciences. 2007;208:300–311. [PubMed]
[3] Lipsitch M, Cohen T, Cooper B, et al. Transmission dynamics and control of Severe Acute Respiratory Syndrome. Science. 2003;300:1966–1970. [PubMed]
[4] Wallinga J, Teunis P. Different epidemic curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures. American Journal of Epidemiology. 2004;160(6):509–516. [PubMed]
[5] Mills CE, Robins J, Lipsitch M. Transmissibility of 1918 pandemic influenza. Nature. 2004;432:904. [PubMed]
[6] Ferguson NM, Cummings DAT, Cauchemez S, Fraser C, Riley S, Meeyai A, Iamsirithaworn S, Burke D. Strategies for containing an emerging influenza pandemic in Southeast Asia. Nature. 437:209–214. [PubMed]
[7] Kuk AY, Ma S. Estimation of SARS incubation distribution from serial interval data using a convolution likelihood. Statistics in Medicine. 2005;24(16):2525–37. [PubMed]
[8] Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B. 2007;274:599–604. [PubMed]
[9] Anderson RM, May RM. Infectious Diseases of Humans: Dynamics and Control. Oxford University Press; New York: 1991.
[10] Kenah E, Robins J. Second look at the spread of epidemics on networks. Physical Review E. 2007;76:036113. [PubMed]
[11] Kenah E, Robins J. Network-based analysis of stochastic SIR epidemic models with random and proportionate mixing. Journal of Theoretical Biology. 2007;249(4):706–722. [PubMed]
[12] Gut A. An Intermediate Course in Probability. Springer-Verlag; New York: 1995.

## Formats:

### Related citations in PubMed

See reviews...See all...

### Cited by other articles in PMC

See all...

• Cited in Books
Cited in Books
PubMed Central articles cited in books
• MedGen
MedGen
Related information in MedGen
• PubMed
PubMed
PubMed citations for these articles