• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Oct 19, 2004; 101(42): 15124–15129.
Published online Oct 11, 2004. doi:  10.1073/pnas.0308344101
PMCID: PMC524041

Forecast and control of epidemics in a globalized world


The rapid worldwide spread of severe acute respiratory syndrome demonstrated the potential threat an infectious disease poses in a closely interconnected and interdependent world. Here we introduce a probabilistic model that describes the worldwide spread of infectious diseases and demonstrate that a forecast of the geographical spread of epidemics is indeed possible. This model combines a stochastic local infection dynamics among individuals with stochastic transport in a worldwide network, taking into account national and international civil aviation traffic. Our simulations of the severe acute respiratory syndrome outbreak are in surprisingly good agreement with published case reports. We show that the high degree of predictability is caused by the strong heterogeneity of the network. Our model can be used to predict the worldwide spread of future infectious diseases and to identify endangered regions in advance. The performance of different control strategies is analyzed, and our simulations show that a quick and focused reaction is essential to inhibiting the global spread of epidemics.

The application of mathematical modeling to the spread of epidemics has a long history and was initiated by Daniel Bernoulli's (1) work on the effect of cowpox inoculation on the spread of smallpox in 1760. Most studies concentrate on the local temporal development of diseases and epidemics. Their geographical spread is less well understood, although important progress has been achieved in a number of case studies (24). The key question, as well as difficulty, is how to include spatial effects and quantify the dispersal of individuals. This problem has been studied with some effort in various ecological systems, for instance, in plant dispersal by seeds (5). Today's volume, speed, and nonlocality of human travel (Fig. 1), as well as the rapid worldwide spread of severe acute respiratory syndrome (SARS) (Fig. 2), demonstrate that modern epidemics cannot be accounted for by local diffusion models that are applicable only as long as the mean distance traveled by individuals is small compared to geographical distances. These local reaction–diffusion models generically lead to epidemic wavefronts, which were observed, for example, in the geotemporal spread of the Black Death in Europe from 1347 to 1350 (610).

Fig. 1.
Global aviation network. A geographical representation of the civil aviation traffic among the 500 largest international airports in >100 different countries is shown. Each line represents a direct connection between airports. The color encodes ...
Fig. 2.
Global spread of SARS. (A) Geographical representation of the global spreading of probable SARS cases on May 30, 2003, as reported by the WHO and Centers for Disease Control and Prevention. The first cases of SARS emerged in mid-November 2002 in Guangdong ...

Here we focus on mechanisms of the worldwide spread of infectious diseases. Our model consists of two parts: a local infection dynamics and the global traveling dynamics of individuals similar to the models investigated in ref. 11. However, both constituents of our model are treated on a stochastic level, taking full account of fluctuations of disease transmission, latency, and recovery on the one hand and of the geographical dispersal of individuals on the other. Furthermore, we incorporate nearly the entire civil aviation network.

Local Infection Dynamics

In the standard deterministic systemic inflammatory response (SIR) model for infectious diseases, a population with N individuals is categorized according to its infection status: susceptibles (S), infectious (I), or recovered and immune (R) (6, 12). The dynamics that specifies the flow among these categories is given by

equation M1

where s = S/N and j = I/N denote the relative number of susceptibles and infecteds, respectively. The relative number of recovered individuals r = R/N is obtained by conservation of the entire population, i.e., r(t) = 1 – j(t) – s(t), and τ = β–1 is the average infectious period. The key quantity describing the infection is the basic reproduction number ρ0 = α/β, which is the average number of secondary infections transmitted by an infectious individual in an otherwise uninfected population. If ρ0 > 1 and the initial relative number of susceptibles are greater than a critical value sc = 1/ρ0, an epidemic develops (dj/dt > 0). As the number of infected individuals increases, the fraction of susceptibles s decreases, and thus the number of contacts of infected individuals with susceptibles decreases until s = sc, when the epidemic reaches its maximum and subsequently decays.

The above SIR model incorporates the underlying mechanism of transmission and recovery dynamics and has been able to account for experimental data in a number of cases. However, transmission of and recovery from an infection are intrinsically stochastic processes, and the deterministic SIR model does not account for fluctuations. These fluctuations are particularly important at the beginning of an epidemic when the number of infecteds is very small.

In this regime, a probabilistic description must be used. Schematically, the stochastic infection dynamics is given by

equation M2

The first reaction reflects the fact that an encounter of an infected individual with a susceptible results in two infecteds at a probability rate α; the second indicates that infecteds are removed (recover) at a rate β and effectively disappear from the population. The quantity of interest is the probability p(S, I; t) of finding a number S of susceptibles and I infecteds in a population of size N at time t. Assuming that the process is Markovian on the relevant time scales, the dynamics of this probability is governed by the master equation (13)

equation M3

In addition to this dynamics, one must specify the initial condition p(S, I; t = t0), which is typically assumed to be a small but fixed number of infecteds I0, i.e., p(S, I; t = t0) = δI,I0δS,NI0.

The relation of the probabilistic master equation 3 to the deterministic SIR model (1) can be made in the limit of a large but finite population, i.e., N [dbl greater-than sign] 1. In this limit, one can approximate the master equation by a Fokker–Planck equation by means of an expansion in terms of conditional moments [Kramers–Moyal expansion (13); see supporting information, which is published on the PNAS web site]. The associated description in terms of stochastic Langevin equations reads

equation M4

equation M5

Here, the independent Gaussian white noise forces ξ1(t) and ξ2(t) reflect the fluctuations of transmission and recovery, respectively. Note that the magnitude of the fluctuations are equation M6 and disappear in the limit N → ∞, in which case Eqs. 1 are recovered. However, for large but finite N, a crucial difference is apparent: Eqs. 4 and 5 contain fluctuating forces, and N is a parameter of the system. A careful analysis shows that even for very large populations (i.e., N [dbl greater-than sign] 1), fluctuations play a prominent role in the initial phase of an epidemic outbreak and cannot be neglected. For instance, even when ρ0 > 0, a small initial number of infecteds in a population may not necessarily lead to an outbreak that cannot be accounted for by the deterministic model.

Dispersal on the Aviation Network

As individuals travel around the world, the disease may spread from one place to another. To quantify the traveling behavior of individuals, we have analyzed all national and international civil flights among the 500 largest airports by passenger capacity. This analysis yields the global aviation network shown in Fig. 1; further details of the data collection are compiled in the supporting information. The strength of a connection between two airports is given by passenger capacity, i.e., the number of passengers that travel a given route per day.

We incorporate the global dispersion of individuals into our model by dividing the population into M local urban populations labeled i containing Ni individuals. For each i, the number of susceptible and infected individuals is given by Si and Ii, respectively. In each urban area, the infection dynamics is governed by master equation 3.

Stochastic dispersal of individuals is defined by a matrix γij of transition probability rates among populations

equation M7

where γii = 0. Along the same lines as presented above, one can formulate a master equation for the pair of vectors X = {S1, I1,..., SM, IM}, which defines the stochastic state of the system. This master equation is provided explicitly in Eq. 3, supporting information.

To account for the global spread of an epidemic via the aviation network, one needs to specify the matrix γij. Because the global exchange of individuals among urban areas is carried out by airborne travel, one can estimate the probability rate matrix γij by t. We assume that an individual remains in an urban area for some time before traveling to another region. A flight ji is chosen according to the weights

equation M8

where Mij is the number of passengers per unit time that depart from an airport in region j and arrive at an airport in region i. The matrix w accounts for the overall connectivity of the aviation network as well as for the heterogeneity in the strength of the connections. Denoting the typical time period, individuals remain at i by τi; the matrix γij is expressed in terms of wij according to γij = wijj. If we assume that each airport is surrounded by a catchment area with a population Ni, the typical time individuals remain at i is given by τi = NijMji. If the capacity of airport i reflects the need of the associated catchment area (i.e., Ni[proportional, variant]ΣjMji), the waiting times τi are identical for all i, i.e., τi = τ = γ–1, which implies γij = γwij. In our model, the global rate γ is a free parameter. To verify its validity, we apply our model to the SARS outbreak. The rate γ can be computed from the ratio of the number of infected individuals in Hong Kong to the number of infected individuals outside Hong Kong, which is provided by the World Health Organization (WHO) data.§ For the local infection dynamics, we use a simple extension of the above stochastic SIR model: The categories S, I, and R are completed by a category L of latent individuals who have been infected but are not yet infectious themselves, accounting for the latency of the disease. In our simulations, individuals remain in the latent or infectious stage for periods drawn from the delay distribution provided in figure 2 in ref. 14. In our simulation, we chose random infection times, the distribution of which is known for SARS (14). In a realistic simulation, the basic reproduction number ρ0 cannot be assumed to be constant over time. Successful control measures, for instance, generally decrease ρ0. We chose a time-dependent ρ0(t), as provided by refs. 15 and 16.

Results of Simulations

Fig. 2 depicts a geographical representation of the results of our simulations. Initially, an infected individual was placed in Hong Kong. For this initial condition, we simulated 1,000 realizations of the stochastic model and computed the mean value left angle bracketI(t)right angle bracket of the number of infecteds at each node i = 1,..., M of the network. Because the size of catchment areas varies on many scales, the fluctuation range is best quantified by the means of the relative variance of z = log I, i.e., equation M9. In our simulations, we computed this measure for every i of the network. Fig. 2 shows the prediction of our model for the spread of SARS at t = 90 days after the initial outbreak in Hong Kong (February 19, 2003), corresponding to the May 20, 2003 outbreak. The results of our simulations are in remarkable agreement with the worldwide spread of SARS as reported by the WHO (compare Fig. 2): There is an almost one-to-one correspondence between infected countries as predicted by the simulations and the WHO data.

Also, the orders of magnitude of the numbers of infected individuals in a country agree (Table 1). Although for most countries the cases reported by the WHO lie within the fluctuation range, two deviations between the reported cases and the predictions of the simulation are apparent: Our simulations predict a relatively high number of SARS cases in Japan (between 26.6 and 137.0). However, the Japanese government reported no confirmed case (only five suspected cases) of SARS in Japan as of May 30, 2003. How a single realization may deviate from the expectation can be seen from the difference between the simulation and the reported cases in the U.S. and Canada. The simulations show that on average the U.S. should have a higher number of SARS cases than Canada, although the opposite was reported by the WHO. The impact of the inherent stochasticity of the infection and traveling dynamics is discussed below.

Table 1.
A comparison of the SARS case reports provided by the WHO and the results of our simulation for all countries with a reported case number ≥4

The Impact of Fluctuations

Bearing in mind the low number of infections and the small value of ρ0 for SARS, the high degree of predictability, i.e., the low impact of fluctuations on the network level, is rather surprising, especially because our simulations take into account the full spectrum of fluctuations of disease transmission, recovery, and dispersal, and that the system evolves on a highly complex network. Naively, one expects that dispersal fluctuations between two given populations are amplified as the epidemic spreads globally, and that no prediction can be made. To clarify this important point, consider the system of two confined populations A and B, which exchange individuals as depicted in Fig. 3. For simplicity, we assume that both populations have the same size (i.e., NA = NB = N), and individuals traverse at a rate γ. Now assume that initially a small number of infected I0 is introduced to population A without any infecteds contained in B. For a sufficiently high number of infecteds in A, an epidemic occurs. For γ > 0, infecteds are introduced to B, and a subsequent outbreak may occur in B after a time lag T. Fig. 3 depicts the results of simulations for two populations with n = 10,000 and ρ0 = 4. Various realizations of the time course IA(t) and IB(t) of the epidemic in both populations were computed. The initial number of infecteds in the population was IA(t = 0) = I0 = 20. Fig. 3 Left depicts the probability p(γ) of an outbreak occurring in population B as a function if the transition rate γ. For large enough rates, the probability is nearly unity, because a sufficient number of infecteds is introduced to B. For very low rates γ, no infecteds are introduced to B during the time span of the epidemic in A, and thus p(γ) → 0as γ → 0. For intermediate values of γ, the probability p(γ) is neither one nor unity, and the time course in population B cannot be predicted with certainty. The function p(γ) is given by

equation M10

where the critical rate γ* is a function of the parameters ρ0 and N. Fig. 3 Insets depict histograms of the time lag T for those realizations for which an outbreak occurred in B. Each histogram corresponds to a different transition rate γ. The smaller γ, the higher the variability in T. Note that even in a range in which p(γ) ≈ 1, the time lag T is still a stochastic quantity with a high degree of variance.

Fig. 3.
Demonstration of the impact of fluctuations on inhomogeneous networks. Two confined populations with exchange of individuals are shown. In each population, the dynamics is governed by the SIR reaction scheme (6). Individuals travel from one population ...

Consequently, the introduction of stochastic exchange of infected individuals leads to a lack of predictability in the time of onset of the initially uninfected population. In light of the analysis of two populations, the predictability in the case of SARS on the aviation network seems even more puzzling.

The situation changes drastically in networks that exhibit a high degree of variability in the rate matrix γij. Clearly, this is the case for the aviation network. Consider the simple network depicted in Fig. 3. Each population contains N individuals. A central population A is coupled to a set of M–1 surrounding populations B1,..., BM–1. Assume that initially a number of infecteds I0 is introduced to the central population A, such that an outbreak occurs. The entire set of rates {γj}j=1,...,M–1 determines the behavior in the surrounding populations. If all rates γj are identical and very small, we expect no infection to occur in the Bj; for large enough γj, an outbreak will occur in every Bj. In the aviation network, however, transition rates are distributed on many scales, and the response of the network to a central outbreak depends on the statistical properties of this distribution denoted by q(γ). To quantify the reaction of the network, we introduce for each surrounding population a binary number ξj with j = 1,..., M – 1, which is unity if an outbreak occurs in Bj and zero if it does not. According to Eq. 8, for a given rate γ, this quantity is a random number with a conditional probability density pi|γ) = (1 – p(γ)) δ(ξi) + p(γ) δ(ξi – 1). The variability of the network is thence quantified by the cumulative variance per population, and we define

equation M11

as a measure for the uncertainty of the network response. If, for example, equation M12, i.e., all transition rates are identical and equal to equation M13, then equation M14(equation M15, which is unity for equation M16. Comparing with Eq. 8, we see that when equation M17 log 2, the system with identical transition rates equation M18 exhibits the highest degree of unpredictability when the rates are of the order of the critical rate defined by Eq. 8. The function equation M19 is shown in Fig. 3.

Now, assume rates γj are drawn from a distribution

equation M20

which implies a high degree of variance within the interval [γmin, γmax] (i.e., γj is distributed uniformly on a logarithmic scale). This high variability in rates drastically changes the predictability of the system. Inserting into Eq. 9 yields equation M21 for strongly distributed rates. In Fig. 3, this function is compared to a system of identical transition rates. On the one hand, for intermediate values of γ ≈ γ*, the predictability is much higher than in the system of identical rates. This is a rather counterintuitive result. Despite the additional randomness in transition rates, the degree of determinism is increased.

Control Strategies

Fig. 4 exemplifies how our model can be used to predict endangered regions if the origin of a future epidemic is located quickly. Fig. 4 depicts simulations of the global spread of SARS at t = 90 days after hypothetical outbreaks in New York and London, respectively. Despite the worldwide spread of the epidemic in each case, the degree of infection of each country differs considerably, which has important consequences for control strategies.

Fig. 4.
Inhomogeneity of the aviation network and control strategies. (A and B) Geographical representation of the results of two simulations of hypothetical SARS outbreaks 90 days after an initial infection in New York (A) and London (B) for the same parameters ...

Vaccination of a fraction of the population reduces the fraction of susceptibles and thus yields a smaller effective reproduction number ρ. If a sufficiently large fraction is vaccinated, ρ drops below 1, and the epidemic becomes extinct. The global aviation network can be used to estimate the fraction of the global population that needs to be vaccinated to prevent the epidemic from spreading. Fig. 4 demonstrates that a quick response to an initial outbreak is necessary if global vaccination is to be avoided. Also depicted is the probability pn(v) of having to vaccinate a fraction v of the population if an infected individual is randomly placed in one of the cities and permitted to travel N = 1, 2, or 3 times. For the majority of originating cities, the initial spread is regionally confined, and thus a quick response to an outbreak requires only a vaccination of a small fraction of the population. However, if the infected individual travels twice, the expected fraction left angle bracketvright angle bracket of the population that needs to be vaccinated is considerable (74.58%). For n = 3, global vaccination is necessary.

As a reaction to a new epidemic outbreak, it might be advantageous to impose travel restrictions to inhibit the spread. Here we compare two strategies: (i) the shutdown of individual connections and (ii) the isolation of cities. Our simulations show that an isolation of only 2% of the largest cities already drastically reduces left angle bracketvright angle bracket (with N = 2) from 74.58% to 37.50% (compare the blue and light-blue curves in Fig. 4). In contrast, a shutdown of the strongest connections in the network is not nearly as effective. To obtain a similar reduction of left angle bracketvright angle bracket, the top 27.5% of connections would need to be taken off the network. Thus, our analysis shows that a remarkable success is guaranteed if the largest cities are isolated as a response to an outbreak.

In a globalized world, with millions of passengers traveling around the world week by week, infectious diseases may spread rapidly around the world. We believe that a detailed analysis of the aviation network represents a cornerstone for the development of efficient quarantine strategies to prevent diseases from spreading. Because our model is based on a microscopic description of traveling individuals, our approach may be considered a reference point for the development and simulation of control strategies for future epidemics.

Supplementary Material

Supporting Information:


We thank E. Bodenschatz for critical reading of the manuscript and stimulating discussions. This research was supported in part by National Science Foundation Grant PHY99-07949.


This paper was submitted directly (Track II) to the PNAS office.

Abbreviations: SARS, severe acute respiratory syndrome; SIR, systemic inflammatory response; WHO, World Health Organization.


Data on flight schedules and airport information are available from OAG Worldwide Limited, London (www.oag.com), and the International Air Transport Association, Geneva (www.iata.org).

§Data on the SARS outbreak are available from the WHO, Geneva (www.who.int/csr/sars/en), and the Center for Disease Control and Prevention, Atlanta (www.cdc.gov/ncidod/sars).


1. Bernoulli, D. (1760) Mém. Math. Phys. Acad. R. Sci. Paris, 1–45.
2. Keeling, M. J., Woolhouse, M. E. J., Shaw, D. J., Matthews, L, Chase-Topping, M, Haydon, D. T., Cornell, S. J., Kappey, J, Wilesmith, J & Grenfell, B. T. (2001) Science 294, 813–817. [PubMed]
3. Smith, D. L., Lucey, B, Waller, L. A., Childs, J. E & Real, L. A. (2002) Proc. Natl. Acad. Sci. USA 99, 3668–3672. [PMC free article] [PubMed]
4. Keeling, M. J., Woolhouse, M. E. J., May, R. M., Davies, G & Grenfell, B. T. (2003) Nature 421, 136–142. [PubMed]
5. Bullock, J. M., Kenward, R. E & Hails, R. S., eds. (2002) Dispersal Ecology, 42nd Symposium of the British Ecological Society (Blackwell, London).
6. Murray, J. D. (1993) Mathematical Biology (Springer, Berlin).
7. Langer, W. L. (1964) Sci. Am. 2, 114–121. [PubMed]
8. Noble, J. V. (1974) Nature 250, 726–729. [PubMed]
9. Grenfell, B. T., Bjornstadt, O. N & Kappey, J. (2001) Nature 414, 695–696.
10. Mollison, D. (1991) Math. Biosci. 107, 255–287. [PubMed]
11. Rvachev, L. A. & Longini, I. M., Jr. (1985) Math. Biosci. 75, 3–22.
12. Anderson, R. M & May, R. M. (1991) Infectious Diseases of Humans (Oxford Univ. Press, Oxford).
13. Gardiner, C. W. (1985) Handbook of Stochastic Methods (Springer, Berlin).
14. Donnelly, C. A., Ghani, A. C., Leung, G. M., Hedley, A. J., Fraser, C, Riley, S, Abu-Raddad, L. J., Ho, L.-M, Thach, T.-Q, Chau, P, et al. (2003) Lancet 361, 1761–1766. [PubMed]
15. Riley, S., Fraser, C., Donnelly, C. A., Ghani, A. C., Abu-Raddad, L. J., Hedley, A. J., Leung, G. M., Ho, L.-M, Lam, T.-H, Thach, T. Q., et al. (2003) Science 300, 1961–1966. [PubMed]
16. Lipsitch, M., Cohen, T., Cooper, B., Robins, J. M., Ma, S., James, L., Gopalakrishna, G., Chew, S. K., Tan, C. C., Samore, M. H., et al. (2003) Science 300, 1966–1970. [PMC free article] [PubMed]
17. Centers for Disease Control (2003) Morbid. Mortal. Wkly. Rep. 52, 241. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...