- Journal List
- NIHPA Author Manuscripts
- PMC3056392

# Modeling the spatial spread of infectious diseases: the GLobal Epidemic and Mobility computational model

^{a,}

^{b}Bruno Gonçalves,

^{a,}

^{b}Hao Hu,

^{c}José J. Ramasco,

^{d}Vittoria Colizza,

^{d}and Alessandro Vespignani

^{}

^{a,}

^{b,}

^{d}

^{a}Center for Complex Networks and Systems Research (CNetS), School of Informatics and Computing, Indiana University, Bloomington, IN 47408, USA

^{b}Pervasive Technology Institute, Indiana University, Bloomington, IN 47406, USA

^{c}Department of Physics, Indiana University, Bloomington, IN 47406, USA

^{d}Computational Epidemiology Laboratory, Institute for Scientific Interchange (ISI), Torino, Italy

^{}Corresponding author.

## Abstract

Here we present the Global Epidemic and Mobility (GLEaM) model that integrates sociodemographic and population mobility data in a spatially structured stochastic disease approach to simulate the spread of epidemics at the worldwide scale. We discuss the flexible structure of the model that is open to the inclusion of different disease structures and local intervention policies. This makes GLEaM suitable for the computational modeling and anticipation of the spatio-temporal patterns of global epidemic spreading, the understanding of historical epidemics, the assessment of the role of human mobility in shaping global epidemics, and the analysis of mitigation and containment scenarios.

**Keywords:**Computational epidemiology, complex networks, multiscale phenomena, human mobility, infectious diseases

## 1. Introduction

The increasing computational and data integration capabilities witnessed in recent years have enabled the development of computational epidemic models of great complexity and realism [36]. Generally accepted methodologies are represented by very detailed agent-based models [17, 33, 18, 19, 24, 8, 34] and large-scale spatial metapopulation models [38, 21, 25, 29, 12, 16, 9, 1, 2]. These two major classes of computational models have different resolutions and limitations. Agent-based models are stochastic, spatially explicit, discrete-time, simulation models where the agents represent single individuals. The infection can spread among individuals by contacts within household members, within school and workplace colleagues and by random contacts in the general population. One of the key features of the model is the characterisation of the network of contacts among individuals based on a realistic model of the sociodemographic structure of the population (see for instance [27] for a comparison between several models based on this approach). The second scheme relies on metapopulation structured models that considers the system divided into geographical regions defining a subpopulation network where connections among subpopulations represent the individual fluxes due to the transportation and mobility infrastructures [1, 2, 3, 10, 11]. Infection dynamics occurs inside each subpopulation and is described by compartmental schemes that depend on the specific etiology of the disease and the containment interventions considered [38, 21]. Agent-based models provide a very rich data scenario but the computational cost and most importantly the need for very detailed input data has limited their use to a few country level scenarios so far [27], up to continent level [34]. On the opposite side, the structured metapopulation models are fairly scalable and can be conveniently used to provide world-wide scenarios and patterns with thousands of stochastic realisations [29, 12, 16, 9, 1, 2, 22]. While on one hand, the level of information that can be extracted in structured metapopulation models is less detailed than those of agent-based models, on the other hand, their computational scalability allows the simulation of disease spreading on the worldwide scale and the use of statistical approaches that leverage on Monte Carlo techniques based on the analysis of a large number of simulation runs exploring the parameter space.

In this paper, we provide a detailed presentation of the Global Epidemic and Mobility (GLEaM) model [2] that uses a structured metapopulation scheme integrating the stochastic modeling of the disease dynamics, high resolution census data worldwide and human mobility patterns at the global scale. GLEaM makes use of high resolution population data [6, 7] that allow for the definition of sub-populations according to a Voronoi decomposition of the world surface centered on the locations of major transportation hubs. This procedure leads to the construction of a metapopulation model consisting of more than 3, 300 subpopulations across the world connected through a network of more than 16, 800 mobility fluxes describing the daily patterns of travel and mobility among subpopulations. In particular GLEaM integrates data obtained from the International Air Transport Association (IATA [30]) and Official Airline Guide (OAG [35]) databases and multimodal mobility data collected and analysed from more than 30 countries in 5 different continents. This integration results in a worldwide multiscale mobility network spanning several orders of magnitude in intensity and spatio-temporal scales. The disease dynamics is simulated by a fully stochastic compartmental approach defining the temporal equations for each subpopulation [1]. The equations of different subpopulations are then coupled through effective interactions and mechanistic schemes accounting for the mobility of individuals encoded in the multiscale mobility network.

The GLEaM computational model trades off the high realism of agent-based models for the computational scalability of the algorithm implementation and the relatively small amount of input data needed to initialize the model. This allows detailed analysis of epidemic patterns at the worldwide scale. This feature is extremely relevant in evaluating the time pattern of emerging infectious diseases, and cannot be accounted for by agent-based models restricted to country or continent level. For instance, given a set of initial conditions for a local outbreak of a new strain of influenza, the timeline of the arrival of the epidemic in each country and the ensuing activity peak are mainly determined by the human mobility network that couples different regions of the world. By looking at individual countries or a given continent in isolation, any estimate of the epidemic time-line is based on assumptions about imported cases from the rest of the world. This is obtained without an explicit coupling or knowledge of the propagation of the disease in the system outside the boundaries of the country or the continent that is the focus of the model. GLEaM instead explicitly integrates human mobility patterns that allow us to consistently simulate the mobility of infectious individuals on the global scale thus providing ab-initio estimates of the epidemic timeline in each country or urban area without assumptions on case importation.

Differently from agent-based models, the scalability of GLEaM has also the advantage of making possible the use of statistical methods such as Monte Carlo likelihood analysis to fit epidemic parameters which are usually not known in the case of new emerging diseases, with the aim of understanding the observed pattern and simulate its possible future spread [1]. This is enabled by the possibility of generating large numbers of in-silico epidemics to allow the self-consistent estimate of all the parameters needed for the simulation of the future propagation of the disease. A large number of computational runs is indeed needed to systematically explore the space of parameters and, for each point in such space, to build a robust statistical ensemble and reduce the fluctuations induced by stochastic effects. The intensive CPU requirements of agent-based models limit the feasibility of large explorations of the space of parameters aimed at estimation procedures, or at performing sensitivity analysis on the parameters included in the models to assess effects in the simulated results induced by their changes [27]. This constraint becomes particularly relevant in the case computational models are used as risk-assessment tools for scenario evaluations of an epidemic emergency in real time.

Here we specify the definition and integration of the different data layers composing the model, and also provide a detailed explanation of the Voronoi tessellation used for the subpopulation definition. The construction of the mobility network and the derivation of the stochastic mobility equations among different subpopulations are described in detail as well. We illustrate the time-scale separation technique that allows for the integration of the mobility processes occurring on small time scales as effective coupling terms. This method reduces the computational cost by simulating in an explicit way only mobility processes occurring on the long time scales. The metapopulation structure and the mobility processes are then integrated in the basic equations describing the time behavior of the disease process within each population. We detail the structure of the equations in the specific case of an influenza-like-illness compartmentalization, although the equations can be generalized to generic compartmental structures according to the disease of interest. The second part of the paper is devoted to the algorithmic implementation of the model. We describe the algorithm structure, inputs and outputs that allow GLEaM to perform the simulation of stochastic realizations of the worldwide unfolding of the epidemic. From these *in silico* epidemics a variety of information can be gathered, such as prevalence, morbidity, number of secondary cases, number of imported cases, hospitalized patients, amounts of drugs used, and other quantities for each subpopulation with a minimal time resolution of 1 day. Finally we provide an example of the results that can be obtained with GLEaM by simulating the 2001–2002 seasonal influenza spreading and comparing the computational results with real data from different surveillance infrastructures.

## 2. Related work

Many data-driven epidemic models have been proposed, however only a few, mostly based on metapopulation schemes, tackle the spatio-temporal behaviour of diseases at the global scale. Agent-based models are to be able to consider individually targeted interventions for the mitigation of an epidemic, as well as the possibility to introduce changes of behavior at the individual level reproducing the adaptation of individuals to the disease spread. This is performed by tracking each agent of the artificial society considered in the model, and applying rules for the behavior of individuals in their virtual space. Therefore, most agent-based models can be very accurate in the description of the spread of a disease in time and spatial scales if it is possible to integrate high quality data at the individual agent level. The difficulties in gathering high quality data worldwide and to the limit imposed by high performance computing, however have restricted the application of agent-based models to local populations or a few countries, –such as e.g., the US [24, 19, 27], the UK [19], Italy [8], Thailand [33, 18] – up to the continent of Europe [34]. Among the metapopulation schemes at the global level available in the literature [29, 12, 16, 9, 1, 2, 22], the main differences lie in the accuracy and completeness of the demographic and mobility layers. Indeed, being based on simple homogeneous assumptions inside each subpopulation, the accuracy and realism of these models are found in their ability to capture the distribution of population and the travel flows of individuals from one subpopulation to another. With the airline transportation system being the main and fastest mean of connection between different parts of the world, previous works have included an always increasing portion of the worldwide airport network in the metapopulation approaches considered. Indeed, even in continental Europe that possesses one of the most structured and modern railway network, long-range railway traffic across countries is just one tenth of the corresponding airline traffic [14]. From samples with 52 airports in Ref. [38, 22], 105 airports in Ref. [12], 155 in Ref. [16], 500 in Ref. [29], up to the complete International Air Transport Association (IATA) [30] and Official Airline Guide (OAG [35]) databases incorporated in GLEaM [9, 2]. Samples of the worldwide airport network usually correspond to the largest airports, the most connected cities, or the most central ones, and therefore they may include a large portion of the total commercial traffic. While including the largest flows of real-world mobility, these samples are limited in their ability to capture the entire network information for a detailed description of the geotemporal evolution of the disease on a city by city basis. The overall paths of spreading may be fairly well reproduced [4], but models based on samples would fail if the question under study focuses on the description of the epidemic behavior at a higher level of detail, such as e.g., country or city level, due to the lack of data on connections and travel fluxes. In addition, the accuracy in reproducing the spreading pattern of diseases is largely challenged by the absence of large fluctuations in the topology of the airline network and in the traffic volumes, and of correlations and non-trivial loops that are responsible for the definition of the geotemporal propagation in the real world [9]. The increase of resolution imposes different requirements in the definition of the population distribution and of additional means of transportation that may become relevant at this level of detail. Previous works considered cities with no geographical reference whose population was obtained from national and international city population databases [29, 12, 16, 9, 22], and did not consider coupling effects other than air transportation. The GLEaM computational model presented here takes into account also the short range mobility to capture the daily population displacements from a given geographical census area to its neighboring one. In addition, the model already integrates long-range railway connections indexed by the OAG database and we are making a progressive introduction of detailed railway networks in specific countries. By integrating a multi-scale mobility layer, GLEaM is therefore the world-wide model that consider a finer description of the evolution of the epidemic behavior, with the air travel dictating the pathways of the disease through the large geographical areas, whereas the daily short-range displacements control the timing of spreading within localized regions [2].

## 3. GLEaM computational model definition

The global epidemic and mobility structured metapopulation (GLEaM) model is based on a metapopulation approach in which the world is divided into geographical regions defining a subpopulation network where connections among subpopulations represent the individual fluxes due to the transportation and mobility infrastructure. GLEaM integrates three different data layers (see Fig. 1). The population layer is based on the high-resolution population database of the “Gridded Population of the World” project of Columbia University [6, 7] that estimates the population with a granularity given by a lattice of cells covering the whole planet at a resolution of 15 × 15 minutes of arc. The transportation mobility layer integrates air travel mobility obtained from the International Air Transport Association (IATA) [30] and OAG [35] databases that contain the list of worldwide airport pairs connected by direct flights and the number of available seats on any given connection, and commuting patterns as obtained from data collected and analyzed from more than 30 countries in 5 continents. The combination of the population and mobility layers allows for the subdivision of the world into georeferenced census areas defined with a Voronoi tessellation procedure around transportation hubs. GLEaM simulates the mobility of individuals from one subpopulation to another by a stochastic procedure in which the number of passengers of each compartment traveling from a subpopulation *j* to a subpopulation is an integer random variable defined by a stochastic process defined on the basis of real mobility data. Short range commuting between subpopulations is modeled with a time scale separation approach that defines the effective force of infections in connected subpopulations. Superimposed on the worldwide population and mobility layers is the epidemic model that defines the disease and population dynamics. The infection dynamics takes place within each subpopulation and assumes the classic compartmentalization in which each individual is classified by one of the discrete states such as susceptible, latent, infectious symptomatic, infectious non-symptomatic or permanently recovered/removed. In the following sections we provide a detailed presentation of each data layer and of the basic equations that defines the computational model.

**...**

### 3.1. Population layer

The dataset of the “Gridded Population of the World” and the “Global Urban-Rural Mapping” projects [6, 7] run by the Socioeconomic Data and Application Center (SEDAC) of Columbia University divides the surface of the world into a grid of cells that can have different resolution levels. Each of these cells has assigned an estimated population value. Out of the possible resolutions, we have opted for cells of 15 × 15 minutes of arc to constitute the basis of our model. This corresponds to an area of each cell approximately equivalent to a rectangle of 25 × 25 *km*^{2} along the Equator. The dataset comprises 823, 680 cells, of which 250, 206 are populated. In order to define the subpopulations that constitute the metapopulation structure of our model we have performed a Voronoi-like tessellation of the Earth surface centered around the airports of the IATA database. In particular, we identify 3, 362 subpopulations centered around indexed IATA airports in 220 different countries. Since the coordinates of each cell center and those of the airports are known, the distance between the cells and the airports can be calculated. We assign each cell to the subpopulation associated to the closest airport that satisfies the following two conditions: (*i*) Each cell is assigned to the closest airport within the same country. And (*ii*) the distance between the airport and the cell does not exceed 200 *km*. This cutoff naturally emerges from the distribution of distances between cells and closest airports, and it is introduced to avoid that in barely populated areas such as Siberia we can generate geographical census areas thousands of kilometer wide but with almost no population. It also corresponds to a reasonable upper cutoff for the ground traveling distance expected to be covered to reach an airport before traveling by plane.

In addition, the tessellation procedure needs to take into account that there exist urban areas served by more than one airport. Examples include London with up to six airports, Paris with two, New York City with three and others. This condition is relevant in the tessellation, as the aim of the procedure is to provide geographical census areas that will correspond to the subpopulation of the metapopulation model, where homogeneous mixing is going to be assumed. Given that the mixing between individuals in a given urban area is expected to be high, independently from their choice of the airport for mobility reasons, we first need to proceed to the aggregation of the groups of airports that serve the same urban area, prior to tessellation. We have searched for groups of airports located close to each other and manually processed the identified groups to select those belonging to the same urban area. The airports of the same group are then aggregated in a single “super-hub”. An example with the final result of the Voronoi tessellation procedure with cells and airports can be seen in Figure 2.

### 3.2. Mobility Layers

The geographical census areas obtained with the tessellation procedure define the basic subpopulations of the GLEaM metapopulation structure. The spatio-temporal patterns of the disease spreading are however associated to the mobility flows that couple different subpopulations. These flows constitute the mobility data layer that is represented as a network of connections among subpopulations that identifies the number of individuals that goes from one subpopulation to the others. The mobility network is made by different kind of mobility processes from short-range commuting to intercontinental flights with time-scale and traffic volumes that span several orders of magnitude. In the following we discuss the data integration process and the construction of this multiscale mobility network.

#### 3.2.1. Worldwide Airport Network

The Worldwide Airport Network (WAN) is composed of 3*,* 362 commercial airports indexed by the IATA located in 220 different countries. The database contains the number of available seats per year for each direct connection between a pair of these airports. The coverage of the dataset is estimated to be 99% of the global commercial traffic. The WAN can be seen as a weighted graph comprising 16*,* 846 edges whose weight, *ω _{j}*, represents the passenger flow between airports

*j*and . The network shows a high degree of heterogeneity both in the number of destinations per airport and in the number of passengers per connection [9, 3, 10, 11].

#### 3.2.2. Commuting Networks

Our commuting databases have been collected from the Offices of Statistics of 30 countries in 5 continents. The full dataset comprehends more than 80*,* 000 administrative regions and over five million commuting flow connections between them (see [2]). The definition of administrative unit and the granularity level at which the commuting data are provided vary enormously from country to country. For example, most European countries adhere to a practice that ranks administrative divisions in terms of geocoding for statistical purposes, the so called Nomenclature of Territorial Units for Statistics (NUTS) going from level 1 to 3 plus the Local Administrative Units (LAU) corresponding to the municipalities and that can be further subdivided in Wards (LAU 2). In most of the cases, we obtained the commuting data at the LAU level 1 or 2. The US or Canada, on the other hand, have different standards and report commuting at the level of counties. Not only there are clear differences across countries in the definition of the administrative divisions, but even within the same country the actual extension, shape, and population of the administrative divisions can be strongly heterogeneous, being a result of historical and administrative reasons.

In order to overcome the differences in spatial resolution of the commuting data across different countries, we define a worldwide homogeneous standard for GLEaM. We used the geographical census areas obtained from the Voronoi tessellation as the elementary units to define the centers of gravity for the process of commuting. This allows to deal with self-similar units across the world with respect to mobility as emerged from the tessellation and not country specific administrative boundaries. We have therefore mapped the different levels of commuting data into the geographical census areas formed by the Voronoi-like tessellation procedure described above. The mapped commuting flows can be seen as a second transport network connecting subpopulations that are geographically close. This second network can be overlaid to the WAN in a multi-scale fashion to simulate realistic scenarios for disease spreading. The network exhibits important variability in the number of commuters on each connection as well as in the total number of commuters per geographical census area. Being the census areas statistically homogeneous we can also extract a general statistical law that allows for the synthetic generation of commuting networks in countries where real data are not available. A full account of the commuting data obtained across different continents and their statistical analysis can be found in Ref. [2].

### 3.3. Disease model

Each geographical census area corresponds to a subpopulation in the metapopulation model. The infection dynamics within each subpopulation is governed by a disease specific compartmental model in which we assume homogeneous mixing in the population. Although the model can use any compartmental structure, for the sake of clarity we will carry on our discussion by using the explicit example of a typical influenza-like illness (ILI) where we consider a Susceptible-Latent-Infectious-Recovered (SLIR) compartmental scheme. In Figure 3, a diagram of the compartmental structure with transitions between compartments is shown. The contagion process, i.e., generation of new infections, is the only transition mechanism which is altered by short-range mobility, whereas all the other transitions between compartments are spontaneous and remain un-affected by the commuting. The rate at which a susceptible individual in subpopulation *j* acquires the infection, the so called force of infection *λ _{j}*, is determined by interactions with infectious persons either in the home subpopulation

*j*or in its neighboring subpopulations on the commuting network. In general, the force of infection is assumed to follow the mass action principle for which the infection rate is

*λ*=

*βI/N*where

*β*is the infection transmission rate and

*I/N*is the density of infected individuals in the population. In the case of asymptomatic individuals the force of infection is usually reduced by a factor

*r*. In the case of multiple interacting subpopulations and different classes of infectives the force of infection will be the sum of different contributions as reported in subsection 4.3.

_{β}*β*or

*r*, respectively, and enters

_{β}β**...**

Given the force of infection *λ _{j}* in subpopulation

*j*, each person in the susceptible compartment (

*S*) contracts the infection with probability

_{j}*λ*Δ

_{j}*t*and enters the latent compartment (

*L*), where Δ

_{j}*t*is the time interval considered. Latent individuals exit the compartment with probability

*ε*Δ

*t*, and transit to asymptomatic infectious compartment ( ${I}_{j}^{a}$) with probability

*p*or, with the complementary probability 1 −

_{a}*p*, become symptomatic infectious. Infectious persons with symptoms are further divided between those who can travel ( ${I}_{j}^{t}$), probability

_{a}*p*, and those who are travel-restricted ( ${I}_{j}^{nt}$) with probability 1−

_{t}*p*. All the infectious persons permanently recover with probability

_{t}*μ*Δ

*t*, entering the recovered compartment (

*R*) in the next time step. All transitions and corresponding rates are summarized in Table 2 and in Figure 3.

_{j}## 4. Epidemic and mobility dynamics

Once the mobility data layers and the disease dynamics has been defined, the number of individuals in each compartment [*m*] and subpopulation *j* follows a discrete and stochastic dynamical equation that reads as

where the term
$\mathrm{\Delta}{X}_{j}^{[m]}$ represents the change due to the compartment transitions induced by the disease dynamics and the transport operator Ω* _{j}*([

*m*]) represents the variations due to the traveling and mobility of individuals. The latter operator takes into account the long-range airline mobility and sets the minimal time scale of integration at 1 day. The mobility due to the commuting flows is included in the model by an effective force of infection obtained using a time scale separation approximation as detailed in the following sections. The term $\mathrm{\Delta}{X}_{j}^{[m]}$ can be written as a combination of a set of operators

*([*

_{j}*m*]

*,*[

*n*]). Each

*([*

_{j}*m*]

*,*[

*n*]) determines the number of transitions from compartment [

*m*] to [

*n*] occurring in Δ

*t*and is simulated as a random variable extracted from a multinomial distribution. The change $\mathrm{\Delta}{X}_{j}^{[m]}$ is then given by the sum

As a concrete example let us consider the evolution of the latent compartment. There are three possible transitions from the compartment: transitions to the asymptomatic infectious, the traveling and the non-traveling symptomatic infectious compartments. The elements of the operator acting on *L _{j}* are extracted from the multinomial distribution

determined by the transition probabilities

and by the number of individuals in the compartment *L _{j}*(

*t*) (its size). All these transitions cause a reduction in the size of the compartment. The increase in the compartment population is due to the transitions from susceptibles into latents. This is also a random number extracted from a binomial distribution

given by the chance of contagion

and a number of attempts equal to the number of susceptibles *S _{j}*(

*t*). After extracting these numbers from the appropriate multinomial distributions, we can calculate the change Δ

*L*(

_{j}*t*) as

### 4.1. The integration of the transport operator

The transport operator is defined by the airline transportation data which provides the number of available seats *ω _{j}* between each pair of airports (

*j,*). The operator is in general affected by fluctuations coming from the fact that the occupancy rate of the airplanes is not 100%. To take into account such fluctuations, we assume that on each connection (

*j,*) the flux of passengers at time

*t*is given by a stochastic variable

where *α* denotes the average occupancy rate of the order of 70% to 90% provided by IATA and *η* is a random number drawn uniformly in the interval [−1*,* 1] at each time step. The number of individuals in the compartment [*m*] traveling from the subpopulation *j* to the subpopulation is an integer random variable, in that each of the
${X}_{j}^{[m]}$ potential travelers has a probability *p _{j}* =

*Δ*

_{j}*t/N*to go from

_{j}*j*to . In each subpopulation

*j*the numbers of individuals

*ξ*traveling on each connection

_{j}*j*→ at time

*t*define a set of stochastic variables {

*ξ*}, which follows the multinomial distribution

_{j}

where (1−Σ* _{}p_{j}*) is the probability of not traveling, and (
${X}_{j}^{[m]}-{\sum}_{\ell}{\xi}_{j\ell}$) stands for the number of non traveling individuals of the compartment [

*m*]. The multinomial distribution provides the correct probability for traveling individuals leaving

*j*to distribute across the possible connections according to {

*p*}. We use standard numerical subroutines to generate random numbers of travelers following these distributions. The transport operator in each subpopulation

_{j}*j*is therefore written as

where the mean and variance of the stochastic variables are
$\langle {\xi}_{j\ell}({X}_{j}^{[m]})\rangle ={p}_{j\ell}{X}_{j}^{[m]}$ and
$\text{Var}({\xi}_{j\ell}({X}_{j}^{[m]}))={p}_{j\ell}(1-{p}_{j\ell}){X}_{j}^{[m]}$. Direct flights as well as connecting flights up to two-legs flights can be considered. It is worth remarking that on average the airline network flows are balanced so that the subpopulation *N _{j}* are constant in time, e.g. Σ

_{[}

_{m}_{]}Ω

*([*

_{j}*m*]) = 0.

### 4.2. Time-scale separation and the integration of the commuting flows

The GLEaM model combines the infection dynamics with long- and short-range human mobility. Each of these dynamical processes operates at a different time scale. The inverse of the rates of the disease dynamics define the time scale of the stochastic process that we can see as the average individual’s permanence in a given compartment. For ILIs there are two important intrinsic time scales, given by the latency period *ε*^{−1} and the duration of infectiousness *μ*^{−1}, both larger than 1 *day*. The long-range mobility given by the airline network has a time scale of the order of 1 *day*, while the commuting takes place in a time scale of approximately *τ*^{−1} ~ 1/3 *day*. The explicit implementation of the commuting in the model thus requires a time interval shorter than the minimal time of airline transportation data. To overcome this problem, we use a time-scale separation technique, in which the short-time dynamics is integrated into an effective force of infection in each subpopulation.

We start by considering the temporal evolution of subpopulations linked only by commuting flows and evaluate the relaxation time to an equilibrium configuration. Consider the subpopulation *j* coupled by commuting to other *n* subpopulations. The commuting rate between the subpopulation *j* and each of its neighbors *i* will be given by *σ _{ji}*. The return rate of commuting individuals is set to be

*τ*. Following the work of Sattenspiel and Dietz [39], we can divide the individuals original from the subpopulation

*j*,

*N*, between

_{j}*N*(

_{jj}*t*) who are from

*j*and are located in

*j*at time

*t*and those,

*N*(

_{ji}*t*), that are from

*j*and are located in a neighboring subpopulation

*i*at time

*t*. Note that by consistency

The rate equations for the subpopulation size evolution are then

By using condition (11), we can derive the closed expression

where *σ _{j}* denotes the total commuting rate of population

*j*,

*σ*= Σ

_{j}*.*

_{i}σ_{ji}*N*(

_{jj}*t*) can be expressed as

where the constant *C _{jj}* is determined from the initial conditions,

*N*(0). The solution for

_{jj}*N*(

_{jj}*t*) is then

We can similarly solve the differential equation for the time evolution of *N _{ji}*(

*t*)

The relaxation to equilibrium of *N _{jj}* and

*N*is thus controlled by the characteristic time [

_{ji}*τ*(1 +

*σ*)]

_{j}/τ^{−1}and

*τ*

^{−1}in the exponentials, respectively. The former term is dominated by 1

*/τ*if the relation

*τ*

*σ*holds. In our case,

_{j}*σ*= Σ

_{j}*, that equals the daily total rate of commuting for the population*

_{i}ω_{ji}/N_{j}*j*. Such rate is always smaller than one since only a fraction of the local population is commuting, and it is typically much smaller than

*τ*

*day*

^{−1}to 10

*day*

^{−1}. Therefore the relaxation characteristic time can be safely approximated by 1

*/τ*. This time is considerably smaller than the typical time for the air connections of one day and hence we can approximate the subpopulations

*N*(

_{jj}*t*) and

*N*(

_{ji}*t*) with their equilibrium values,

This approximation, originally introduced by Keeling and Rohani [32], allows us to consider each subpopulation *j* as having an effective number of individuals *N _{ji}* in contact with the individuals of the neighboring subpopulation

*i*. In practice, this is similar to separate the commuting time scale from the other time scales in the problem (disease dynamics, traveling dynamics, etc.). While the approximation holds exactly only in the limit

*τ*→ ∞, it is good enough as long as

*τ*is much larger than the typical transition rates of the disease dynamics. In the case of ILIs, the typical time scale separation between

*τ*and the compartments transition rates is close to one order of magnitude or even larger. The Eq. (17) can be then generalized in the time scale separation regime to all traveling compartments [

*m*] obtaining the general expression

while ${X}_{jj}^{[m]}={X}_{j}^{[m]}$ and ${X}_{ji}^{[m]}=0$ for all the other compartments which are restricted from traveling. These expressions will be used to obtain the effective force of infection taking into account the interactions generated by the commuting flows.

### 4.3. Effective force of infection

The force of infection *λ _{j}* that a susceptible individual of a subpopulation

*j*sees can be decomposed into two terms:

*λ*and

_{jj}*λ*. The component

_{ji}*λ*refers to the part of the force of infection which is due to interactions among individuals in

_{jj}*j*. While

*λ*indicates the force of infection acting on susceptibles of

_{ji}*j*during their commuting travels to a neighboring sub-population

*i*. The effective force of infection can be estimated by summing these two terms weighted by the probabilities of finding a susceptible from

*j*in the different locations,

*S*and

_{jj}/S_{j}*S*, respectively. Using the time-scale separation approximation that establishes the equilibrium populations of Eq. (18), we can write

_{ji}/S_{j}We will focus now on the calculation of each term of the previous expression. The force of infection (see Table 2) occurring in a subpopulation *j* is due to the local infectious persons staying at *j* or to infectious individuals from a neighboring subpopulation *i* visiting *j* and so we can write

where *β _{j}* is introduced to account for the seasonality in the infection transmission rate (if the seasonality is not considered, it is a constant), and
${N}_{j}^{\ast}$ stands for the total effective population in the subpopulation

*j*. By definition, ${I}_{jj}^{nt}={I}_{j}^{nt}$ and ${I}_{ji}^{nt}=0$ for

*j*≠

*i*. If we use the equilibrium values of the other infectious compartments (see Eq. (18)), we obtain

The derivation of *λ _{ji}* follows from a similar argument yielding:

where *υ*(*i*) represents the set of neighbors of *i*, and therefore the terms under the sum are due to the visits of infectious individuals from the subpopulations , neighbors of *i*, to *i*. By plugging the equilibrium values of the compartment into the above expression, we obtain

Finally, in order to have an explicit form of the force of infection we need to evaluate the effective population size
${N}_{j}^{\ast}$ in each subpopulation *j*, *i.e.*, the actual number of people at the location *j*. The effective population is
${N}_{j}^{\ast}={N}_{jj}+{\sum}_{i}{N}_{ij}$, that in the time-scale separation approximation reads

Note that in these equations all the terms corresponding to compartments have an implicit time dependence.

By inserting *λ _{jj}* and

*λ*into Eq. (19), it can be seen that the expression for the force of infection includes terms of zeroth, first and second order on the commuting ratios (i.e.,

_{ji}*σ*). These three term types have a straightforward interpretation: The zeroth order terms represent the usual force of infection of the compartmental model with a single subpopulation. The first order terms account for the effective contribution generated by neighboring subpopulations, and is due to the contacts between susceptible individuals of subpopulation

_{ij}/τ*j*and infectious individuals of neighboring subpopulations

*i*. This can occur in two ways – either susceptible individuals of

*j*visiting

*i*or infectious individuals of

*i*visiting

*j*. The second order terms correspond to an effective force of infection generated by the contacts of susceptible individuals of subpopulation

*j*meeting infectious individuals of subpopulation (neighbors of

*i*) when both are visiting subpopulation

*i*(see Figure 4). This last term is very small in comparison with the zeroth and first order terms, typically around two order of magnitudes smaller, and in general can be neglected.

### 4.4. Seasonality modeling

To model seasonal variations we follow the approach of Cooper *et al* [12] and scale the basic reproduction ratio *R*_{0} by a seasonal function, *s _{i}* (

*t*),

where *i* stands for the North or South hemispheres. This function is identically equal to 1.0 in the tropical regions. *t _{max,i}* is the time corresponding to the maximum seasonal effect, Jan 15 in the North and six months later in the South. Seasonality has a dual effect, it increases the value of

*R*

_{0}up to

*R*

_{max}=

*α*

_{max}

*R*

_{0}with

*α*

_{max}= 1.1 [26] and reduces it down to

*R*=

_{min}*α*

_{min}R_{0}.

### 4.5. Age structure

In order to achieve refined analysis including the impact of an epidemics on different age groups, it is possible to include a generalization of the basic formalism that takes into account the presence of different contact rates among individuals belonging to different age bracket or more generally specific population groups. We start by distinguishing among different age groups with varying contact rates by using the results by Wallinga *et al* in [43]. In 2006, Wallinga *et al* [43] measured the contact rates using a group of 1*,* 813 Dutch survey participants. With such data it is possible to write a contact matrix *M*, describing how many interactions an individual in one class has with individuals in a different age group. The main characteristic of the contact matrix is its asymmetry. This is easily explained if, for example, one considers children and adults. Children almost always live with adults, but adults do not always live with children. In order to obtain the effective rate of infection, we must multiply the probability of infection by appropriately rescaled rates describing the contacts between different age groups. A full description of the generalization of the formalisms is reported in the Appendix. While the theoretical and computational formalisms are ready to be generalized to the inclusion of age classes in the system, the main limitation to proceed along this direction is in the lack of data. Reliable information can be obtained on the age structure of most of the countries in the world, however detailed data on the contact matrix are limited to specific countries or settings, therefore a data-driven generalization to the whole world is still not available.

## 5. Algorithms, the simulator and its implementation

The GLEaM simulation toolbox is implemented in a modular way. Each module performs a single function, and they can be combined in different ways to include or remove specific features. In Algorithm 1 we outline the general program flow of a basic GLEaM run.

### 5.1. Long distance travel

Each time step represents a full day. At the start of the time step, we use the flight network to move travelers to their destination using Algorithm 2. Travel is assumed to be instantaneous with no transitions being possible on route. Performing this step at the start of the “day”, guarantees that incoming travelers will contact with the local inhabitants during that day. As a consequence, the arrival time for the infection is the day at which the first infected traveler arrives and this seed individual is considered to have a full day chance of infecting others. The probability of traveling changes from day to day through fluctuations in the occupancy rate of flights, as shown in Algorithm 2, where *α* represents the average occupancy rate of the plane, and *η* is a stochastic random variable uniformly distributed between [−1, 1]. The Flight module can be customized in order to consider the effects of generalized or location specific airline traffic reductions.

### 5.2. Compartment transitions

The GLEaM framework is conceived in a generic way that facilitates the simulation of an arbitrary compartmental model that is given as part of the input. The infection module is completely separated from the other modules (like Flight and Aggregation). The module can be customized in order to simulate the effect of policy measures that modify the transmission rates during a specific period of time.

The epidemic model description is processed to generate a directed multigraph, where each node represents a compartment and each edge a transition, following the representation of Figure 3. Each edge is given a type, a weight and several other attributes. The type identifies whether the edge corresponds to a contagion or a spontaneous transition and the weight is the rate of transition. In the case of contagion transitions, the infectious agent is also identified, as there may be multiple infectious compartments as shown by Figure 3. This structure provides a convenient way of internally representing arbitrarily complex models as well as facilitating an efficient implementation. The edges contain all the information necessary to calculate the transition probabilities that can then be used directly as arguments of the multinomial function that calculates the number of individuals making the transition.

### 5.3. Aggregation and post-processing

The output produced by each run includes the population of each compartment for each census area at each time step and the number of transitions along each of the edges in the transition graph. The final step performed after each simulated day is a partial aggregation of the results, in order to both simplifying the post processing required to obtain useful results and reducing the already considerable amount of output generated for each run. At this point in the simulation, the populations of each census area and each compartment have already been updated and several quantities of interest can be calculated. In particular, we calculate the number of secondary cases generated during this specific time step and the current incidence at each of the following aggregation levels:

- Census area
- Country
- Region
- Continent
- Hemisphere
- Globe

In the case of some countries, we also consider within-country divisions, such as US states and Australian provinces.

After the run is finished, the output data files are post processed by a series of Python scripts to generate the analysis, figures and animations that are finally used. The advantage of decoupling simulation and analysis is in the flexibility it gives in tailoring the whole process. While some post processing steps (like the generation of epidemic profiles, arrival times and ArgGIS illustrations) are almost always considered, others can be added, removed or customized for specific situations. The full simulation process, containing all the steps described above, is illustrated schematically in Figure 5.

## 6. GLEaM at work: Simulation of 2001–2002 Seasonal Influenza A

In order to present a case study for the use of the GLEaM simulator we consider the spreading of seasonal influenza worldwide. Here we want to show how the model calibration may proceed by using real data from the surveillance and monitoring systems and what parameters are crucial in the description of the disease spread. Every year, seasonal influenza circulates globally and infect from 5% to 15% of the population, resulting in 3–5 million severe cases and ~ 500, 000 deaths worldwide [42, 45]. For the sake of simplicity, we focus on one influenza season with one dominant strain, in order to neglect complications arising from the interplay of different strains. This makes the 2001–2002 season a good candidate, which satisfies these criteria, among all the seasons from 1998 to 2006. In the Northern hemisphere, the season 2001–2002 has less than 5% mean proportion of annual A/H3N2 isolates, while in 2001–2002 this proportion is above 60% [20].

### 6.1. Model calibration and simulation

The main issue in the simulation of the influenza is the parametrization of the model in terms of the transmission rate and the initial condition for the circulation of a given strain at the global level. The origin of annual influenza circulation is still an unknown issue [37], however, from past experiences, new variants of influenza often originate in East-Southeast Asia [37], or Southeast China [13, 40, 41]. For season 2001–2002, according to the epidemiological records [44], Hong Kong is the only country/region in SE Asia having sporadic A/H3 influenza activity during June and July 2001. We therefore choose Hong Kong as the source of the influenza strain and explore possible starting dates between June and July. We further assume that a fraction equal to 10^{−5} of the city’s population is latent, consistently with the literature and with the specific choice for the same season in Ref. [26]. In the case of influenza, we can implement the compartmental structure reported in Fig. 3. For the parameters of the model, we consider a latent period of *ε*^{−1} = 1.1 days, and infectious period of *μ*^{−1} = 2.95 days. The average generation interval for our choice is around 4 days, a value close to published estimates for the A/H3N2 [5]. Also in agreement with the literature, we assume that only a fraction of *γ* = 60% of the world population is susceptible to the circulating strain [26]. For the seasonality rescaling, we use the same seasonal rescaling as in Ref. [1]. We fix *α _{max}* and

*α*at 1.1 and 0.1, respectively, to reflect the seasonal variabilities of influenza transmission.

_{min}The transmissibility of the disease is measured by the basic reproduction number *R*_{0} which is defined as the average number of infected cases generated by the introduction of a single infectious individual into a fully susceptible population. For the compartmentalization used here, *R*_{0} can can be obtained in each subpopulation by evaluating the largest eigenvalue of the Jacobian or next generation matrix of the infection dynamics in a disease-free state [15, 28], yielding

Given the parameters *p _{a}* and

*r*, the value of

_{β}*R*

_{0}depends on the transmission rate

*β*that fixes the reference reproductive number in each subpopulations. For seasonal influenza, however, since the fraction of initially susceptible population is not one, the reproductive number must be rescaled by the proportion of susceptible individuals and we define an effective reproductive number

*R*=

_{eff}*γR*

_{0}.

In order to find a best estimate of the transmissibility and initial start date *t*_{0}, we perform simulations of the model for varying values of these two parameters and compare the results with the empirical data on the influenza activity peak in the French regions. The French Sentinelles Network is a surveillance system reported by voluntary and unpaid general practitioners (GP), which keeps a weekly record of ILI consultations since 1984 [23]. From the data, we can obtain for each French region the time of the activity peak *t ^{emp}*

^{_}

*. We then perform a latin square sampling in the phase space of the parameters*

^{peak}*R*and

_{eff}*t*

_{0}, constructing the surface representing the

*χ*

^{2}values obtained by comparing the empirical peak times with the average simulated activity peak times ${t}_{i}^{\mathit{sim}\_\mathit{peak}}$ obtained by analyzing 2, 000 stochastic GLEaM realizations for each sampled point. This Monte Carlo latin sampling procedure is computationally intensive as for each sampled point 2, 000 realization of the epidemic propagation worldwide must be generated. We have opted for a trade-off in the accuracy and computational cost samplings the phase space with a resolution Δ

*R*= 0.03 and Δ

_{eff}*t*

_{0}= 7 days. The best fit for the initial condition and the transmissibility is associated with the minimum of the

*χ*

^{2}surface. Figure 6 reports the

*χ*

^{2}surface as a function of

*R*and seeding date

_{eff}*t*

_{0}. The best fit range for

*R*is between 1.47 and 1.53 with the initial date between late June and early July, depending on the

_{eff}*R*. From the analysis of the surface, we find a best estimate corresponding to

_{eff}*R*= 1.50 and

_{eff}*t*

_{0}= July 11. A more accurate analysis with confidence interval is needed in order to provide a full discussion of these epidemiological results. This is however beyond the scope of this paper, where we want only to provide a practical example of the GLEaM implementation.

*χ*

^{2}values as functions of effective reproduction ratio (

*R*) and seeding date (

_{eff}*t*

_{0}) of simulated epidemics obtained by 2, 000 stochastic runs for each pair of parameter values. Activity peak times of ILI consultations

**...**

The best estimate of the parameters is obtained by using data only from a single country, in this case France. In order to provide an example of the accuracy of the GLEaM model in reproducing the spatio-temporal patterns of the disease spreading, we can compare the numerical results obtained with the parameters fitted in France with empirical data in several countries where reliable surveillance data is available. We have chosen a set of countries for which the reported dominant strain is A/H3N2 with a sufficient number of reported cases. Data is obtained from either the national public health agencies or the regional organizations. The full list of selected countries is shown in Table 3.

In Figure 7, we report the activity peaks for the selected countries and compare our predictions with the 2001–2002 weekly surveillance data. The simulation and empirical data show a good agreement in most of the countries and regions. All data are normalized to 1, which guarantees that activities are shown on the same scale. For the simulated data, the activity peaks are reported with median values from 2, 000 stochastic simulations, along with the 95% reference range. For the empirical data, in addition to the number of laboratory confirmed cases, we also refer to additional indicators, such as ILI or Acute Respiratory Infection (ARI) consultation rate (per 100, 000 population or per 1, 000 patient visits) which is usually conducted by physicians. For selected countries having only one type of dominant strain, the percentage of ILI is also a good indicator of influenza activity for the seasonal activity. Table 3 shows the dominant virus type and the data source used for individual countries. While the analysis reported here must be considered only as a simple illustration of the GLEaM implementation, the results appear to recover with good agreement the the main spatio-temporal pattern of the 2001–2002 season. We want to stress that the timing of the epidemic spreading across different regions of the world is mostly determined by the human mobility patterns that are integrated in the GLEaM model with great accuracy. The best fit of the parameters obtained by the timeline of the epidemic in one or more countries allows the model to self-consistently capture the mobility of infected individuals and case importation that set the epidemic timeline worldwide.

## 7. Conclusions

Here we have provided a detailed description of the GLEaM simulator that is a discrete stochastic epidemic computational model based on a metapopulation approach in which the world is defined in geographical census areas connected in a network of interactions by human travel fluxes corresponding to transportation infrastructures and mobility patterns. Given the multitude of scales and mobility layers existing in the GLEaM model, the process of interest can be studied on a wide range of scales ranging from small administrative units (counties, municipalities) to worldwide. Although the GLEaM model has been used in the past in the analysis of realistic scenarios and in comparison with real data, also in relation with H1N1 pandemic, here we have presented for the first time all the data integration details, models and algorithms implementation that are under the hood of the GLEaM simulator. It is also worth noticing that while the model is being developed and tested in the context of emerging diseases such as new pandemic strains, it considers different transportation and interaction layers and distinguishes the mobility modeling from the dynamical process mediated by the human dynamics. This allows the integration of different processes of social contagion that are not necessarily of biological origin but occurs taking advantage of the individuals mobility such as information spreading, social behavior, etc. GLEaM has proved to be very flexible and we are working to make the GLEaM platform available to the scientific community at large. In particular we are developing an easy to use interface to the software that allows for the simulation and visualization of the spread of epidemics at a global scale.

## Acknowledgments

We are grateful to the International Air Transport Association for making the airline commercial flight database available to us. This work has been partially funded by the NIH R21-DA024259 award, the Lilly Endowment grant 2008 1639-000 and the DTRA-1-0910039 award to AV; the EC-ICT contract no. 231807 (EPIWORK), and the EC-FET contract no. 233847 (DYNANETS) to AV and VC; the ERC Ideas contract n.ERC-2007-Stg204863 (EPIFOR) to VC. The work has been also partly sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-09-2-0053. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

## Biographies

Duygu Balcan is a Research Associate at the Center for Complex Networks and Systems Research, School of Informatics and Computing, Indiana University, Bloomington. Her current research interests involve mathematical and computational modeling of contagion processes with a specific focus on spreading of emergent infectious diseases. She obtained her PhD in Physics from Istanbul Technical University, Turkey, in 2007.

Bruno Gonçalves completed his joint PhD in Physics MSc in C.S. at Emory University in Atlanta, GA in 2008 following which he joined the Center for Complex Networks and Systems Research at Indiana University as a post-doctoral research associate. His research activity focuses on using computational, visualization and data analysis methods for the study of Complex Systems in a multidisciplinary context. Current projects include detailed epidemic modeling in structured populations; knowledge diffusion on large technological networks; and the study of human behavior through the analysis of proxy social network dynamics.

Hao Hu completed his undergraduate studies at the Department of Physics, University of Science and Technology of China (USTC) in July, 2005. He then went to Indiana University and obtained his physics master’s degree in February, 2007. Currently he is a Ph.D. student in the physics department and the biocomplexity institute. During his study he joined the complex system group. His research interests involve the study of complex networks, especially the mathematical modeling of dynamical processes on networks, such as the spreading of diseases and malwares.

José J. Ramasco completed his PhD at the “Universidad de Cantabria” in Santander (Spain). After this, he transferred to Oporto (Portugal) for a two years postdoc in the “Centro de Fisica do Porto”, an institute of the University of Oporto. Later he hold a two year postdoc fellowship at the Physics Department of Emory University in Atlanta, GA. Since 2006, he is a research scientist at the ISI Foundation in Turin, Italy. His research activity focuses on several aspects of complex networks, from theoretical issues to real world applications including realistic modeling of epidemic spreading or of user Web traffic.

Vittoria Colizza is a Research Scientist at the Institute for Scientific Interchange (ISI Foundation) in Turin, Italy, where she leads the Computational Epidemiology Lab. Her research focuses on the characterization and modeling of the spread of emerging infectious diseases, through an integrated approach that includes methods of complex systems, statistical physics techniques, computational sciences, and GIS. After obtaining her PhD in Physics at SISSA in Trieste, Italy, in 2004, she held a research position at Indiana University in Bloomington, IN, USA, and joined the ISI Foundation in 2007. She was awarded in 2008 a Career Grant by the European Research Council.

Alessandro Vespignani is currently James H. Rudy Professor of Informatics and Computing and adjunct professor of Physics and Statistics at Indiana University where he is also the director of the Center for Complex Networks and Systems Research (CNetS) and associate director of the Pervasive Technology Institute. Recently Vespignani’s research activity focuses on the interdisciplinary application of statistical and simulation methods in the analysis of epi spreading phenomena and the study of biological, social and technological networks. Vespignani is an elected fellow of the American Physical Society and is serving in the board/leadership of a variety of professional association and journals.

## Appendix A. Generalization including age structure

We now introduce the formalisms that allow for the inclusion of different contact rates among individuals in different age groups.

While we still make the fundamental assumption that the epidemic is governed by a single transmission rate *β*, we must now rescale it to take into account the different contact rates among different age groups. The contact matrix *M*, shown in Table A.4 describes how many contacts an individual in one class has with individuals in a different age group. Columns correspond to survey participants, and rows to the people they interacted with. As an example, we use the data gathered in 2006 by Wallinga *et al* [43] who measured the contact rates using a group of 1, 813 Dutch survey participants. For self consistency, we required that the total number of interactions between two age groups must be the same. In other words, so we must have

Symmetrized matrix values are then given by *C _{ab}* =

*m*·

_{ab}*N*/

*N*, where

_{a}*N*is the number of individuals in age group

_{a}*a*and

*N*is the total number of individuals. Values of

*N*for both the survey participants and the entire Dutch population are given in Table A.5 and the full symmetric matrix

_{a}*C*is shown in Table A.6.

While Wallinga considers only 6 age groups, our demographic data for each county, as provided by the US Census Bureau [31] is more fine grained. We make the simplest choice and assume that people are uniformly distributed within each 5 year compartment, thus combining the age groups so that they fit the Wallinga picture.

A change in the way the different populations interact with each other necessarily implies a change in the way the epidemic spreads, requiring modifications to the *R*_{0} calculation. We apply the techniques described in [15, 28] to the general age structure case of interest.

Let us define *$\stackrel{\u20d7}{x}$* = (*x*_{1}, …, *x _{n}*) to be a vector containing the number of individuals in each infected compartment. We have 4 such compartments,

*L*=

*x*

_{1},

*I*=

^{t}*x*

_{2},

*I*=

^{nt}*x*

_{3}and

*I*=

^{a}*x*

_{4}. The matrix

*F*, defining the rate of creation of new infected cases is then:

with a simple meaning: Latent cases (first row) are created (from susceptible) with rate *β* (*r _{β}β*) through interaction with

*I*

^{t}^{,}

*(*

^{nt}*I*). Since these are the only ways in which the disease can spread through a Susceptible population, all other entries in the matrix are null. After infection, the disease progresses through several stages as described by the matrix

^{a}*V*= (

*v*) where element

_{ab}*v*is the number of individuals leaving compartment

_{ab}*a*to compartment

*b*, minus the number of individuals following the opposite path. For seasonal flu, we have:

Using these two matrices we can calculate the next generation matrix,

that describes the complete epidemic process and whose interpretation is relatively simple: *F* is the rate at which new infections are created and *V*^{−1} is the average duration of each infected compartment. The basic reproductive ratio, *R*_{0} is finally given by the maximum eigenvalue of this matrix that in a model without age structure reads as

Adding age structure results in a proliferation of infected compartments. In the case of the Wallinga’s age grouping, we have 6 times as many infected compartments. Fortunately, the fact that we do not consider aging implies that individuals never move between compartments corresponding to different age groups, thus greatly simplifying the analysis. We define the new vector *$\stackrel{\u20d7}{x}$*^{†} to be a concatenation of 6 vectors *$\stackrel{\u20d7}{x}$* each corresponding to a different age cohort. Mixing between the different groups results in a susceptible individual becoming latent by interacting with an infectious person from any other group. In matrix notation, and using the previous definitions, the new infection matrix *F*^{†} is given by:

where × represents the Kronecker product. After the initial infection, the disease progresses as before with each age group being isolated from all others. The progression matrix *V*^{†} is then:

where is the 6 × 6 identity matrix. The next generation matrix can now be written as:

Therefore, the new basic reproductive number can be written as a function of the previous one:

This formulation is completely generic and completely generalizable for any number of age groups with only a very small numerical effort. A specific value of *R*_{0} can be set by inverting this expression and calculate the appropriate value of *β*(*R*_{0}).

Before we can use this formulation in our global simulation, we must take into account the different demographics of each country or census areas and their change in time. Using the definitions above, we can write:

to describe the increase in the number of people in compartment *I _{i}* in a basic SI model. Defining the fraction of individuals in compartment

*I*as

_{a}*ρ*

_{Ia}*I*/

_{a}*N*, we rewrite this expression as:

where *C _{ab}* is the symmetric matrix defined above. Since this expression depends only on the relative fraction of individuals in each compartment and not on the details of how many people are actually in each compartment, we can safely conclude that

*C*is the matrix that must be kept constant for every population. We can now identify:

_{ab}or, in other words:

as the matrix that we must use in Eq. A-1 and that will differ from country to country. Substituting in Eq. A-2 we obtain:

where *N* is the total population for the subpopulation considered and *C _{ab}* is the same for every population. The resulting force of infection is then:

During the derivation of this expression, and for the sake of clarity, we considered only a *single population*. The expression for the full force of infection including the mobility dynamics Eq. A-4 can be obtained after the application of the prescription of Sec. 4. This can be easily done by replacing every term of the form *β _{i}I_{i}* by

## Footnotes

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

## References

*R*

_{0}in models for infectious diseases in heterogeneous populations. J Math Bio. 1990;28:365–382. [PubMed]

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (1.4M) |
- Citation

- The GLEaMviz computational tool, a publicly available software to explore realistic epidemic spreading scenarios at the global scale.[BMC Infect Dis. 2011]
*Van den Broeck W, Gioannini C, Gonçalves B, Quaggiotto M, Colizza V, Vespignani A.**BMC Infect Dis. 2011 Feb 2; 11:37. Epub 2011 Feb 2.* - The modeling of global epidemics: stochastic dynamics and predictability.[Bull Math Biol. 2006]
*Colizza V, Barrat A, Barthélemy M, Vespignani A.**Bull Math Biol. 2006 Nov; 68(8):1893-921. Epub 2006 Jun 20.* - Comparing large-scale computational approaches to epidemic modeling: agent-based versus structured metapopulation models.[BMC Infect Dis. 2010]
*Ajelli M, Gonçalves B, Balcan D, Colizza V, Hu H, Ramasco JJ, Merler S, Vespignani A.**BMC Infect Dis. 2010 Jun 29; 10:190. Epub 2010 Jun 29.* - Multiscale mobility networks and the spatial spreading of infectious diseases.[Proc Natl Acad Sci U S A. 2009]
*Balcan D, Colizza V, Gonçalves B, Hu H, Ramasco JJ, Vespignani A.**Proc Natl Acad Sci U S A. 2009 Dec 22; 106(51):21484-9. Epub 2009 Dec 14.* - Epidemionics: from the host-host interactions to the systematic analysis of the emergent macroscopic dynamics of epidemic networks.[Virulence. 2010]
*Reppas AI, Spiliotis KG, Siettos CI.**Virulence. 2010 Jul-Aug; 1(4):338-49.*

- Epidemic Contact Tracing via Communication Traces[PLoS ONE. ]
*Farrahi K, Emonet R, Cebrian M.**PLoS ONE. 9(5)e95133* - The Spatial Resolution of Epidemic Peaks[PLoS Computational Biology. ]
*Mills HL, Riley S.**PLoS Computational Biology. 10(4)e1003561* - Long-Distance Travel Behaviours Accelerate and Aggravate the Large-Scale Spatial Spreading of Infectious Diseases[Computational and Mathematical Methods in M...]
*Xu Z, Zu Z, Zheng T, Zhang W, Xu Q, Liu J.**Computational and Mathematical Methods in Medicine. 2014; 2014295028* - Commuter Mobility and the Spread of Infectious Diseases: Application to Influenza in France[PLoS ONE. ]
*Charaudeau S, Pakdaman K, Boëlle PY.**PLoS ONE. 9(1)e83002*

- MedGenMedGenRelated information in MedGen
- PubMedPubMedPubMed citations for these articles
- TaxonomyTaxonomyRelated taxonomy entry
- Taxonomy TreeTaxonomy Tree

- Modeling the spatial spread of infectious diseases: the GLobal Epidemic and Mobi...Modeling the spatial spread of infectious diseases: the GLobal Epidemic and Mobility computational modelNIHPA Author Manuscripts. 2010 Aug 1; 1(3)132

Your browsing activity is empty.

Activity recording is turned off.

See more...