Chapter 5Progression Dynamics

Publication Details

Progression depends on various rate processes, such as the rate of somatic mutation and the time for a solid tumor to build a blood supply. To link rate processes to the observed age-onset curves of cancer incidence, one must understand how the processes combine to determine the speed of progression. This chapter introduces the quantitative theory that links carcinogenic process and incidence.

The first section provides background on mathematical theories of progression. The general approach begins with the assumption that cancer develops through a series of stages. This assumption of multistage progression sets the framework in which to build particular models of progression dynamics. Within this framework, I argue in favor of simple theories that make comparative predictions. If one understands how a particular process affects progression, then one should be able to predict how altering that process changes progression dynamics.

The second section lists some of the observations on cancer incidence that a theory should seek to explain. These observations set the target for mathematical theory and emphasize the need to link progression dynamics to incidence.

The third section introduces the classical model of multistage progression. This model predicts an approximately linear relation between incidence and age when plotted on log-log scales. The observed patterns match this prediction for several cancers. However, the fit of observations to theory is not by itself particularly informative. To make further progress, I emphasize the need for comparative theories. I briefly mention one comparative theory that follows from the classical multistage model: the ratio of incidence rates between two groups depends on the difference in the number of rate-limiting steps in progression. I develop that theory in later chapters.

The fourth section discusses why one should bother with abstract theories that often run ahead of empirical understanding. The main reason is that we are not likely to have much luck in understanding real systems if we cannot understand with simple logic how various processes could in principle combine to influence progression. In addition, it helps to have a toolbox of possible explanations that one thoroughly understands. Such understanding prevents the common tendency to latch onto the first available explanation that seems to fit the data, without full consideration of reasonable alternatives.

The fifth section presents the equations for a simple model of progression through a series of stages. I emphasize that the equations are completely equivalent to a simple diagram that illustrates the flow between stages of progression. The equations introduce the notation and structure of a formal model, paving the way for more detailed analysis in the following chapters.

The sixth section develops technical definitions for incidence and acceleration that follow from the formal specification of the model in the previous section. Incidence provides the key measure of occurrence for cancer: the cases of cancer per year, at each age, for a given population of individuals. Incidence is a rate—cases per year—just as velocity is a rate. Acceleration is the rate of change in incidence with age: how fast incidence increases or decreases as individuals become older. Theories about the carcinogenic role of particular biochemical mechanisms must ultimately link those mechanisms to their effects on incidence and acceleration.

5.1 Background

Multistage Progression Is a Framework, Not a Hypothesis

Most mathematical models of cancer progression descend from Armitage and Doll's (1954) paper on multistage theory. The phrase "multistage theory" has led to some confusion. A multistage model simply assumes that cancer does not arise in a single step—an assumption supported by much evidence. So, "multistage theory" is not really a particular theory; it is a framework that describes the kind of dynamical processes used to model progression through multiple stages.

This framework provides tools to develop testable quantitative hypotheses that link progression dynamics to the curves of age-specific cancer incidence. Progression dynamics also provides a notion of causality: a process causes cancer to the extent that the process alters the age-specific incidence curve.

The Importance of Comparative Hypotheses

A mathematical analysis for the age of cancer onset depends on several parameters. Those parameters might include the number of stages in progression, the somatic mutation rate that moves a tissue from one stage to the next, the number of cells in the tissue, and the precancerous rate of cell division. Given values for those parameters, the mathematical model generates an age-specific incidence curve.

A mathematical model may be used in two different ways: fit or comparison.

A fit chooses values for all parameters that minimize the distance between the predicted and observed age-specific incidence curves. A good fit provides a close match between prediction and observation. A good fit also uses realistic values for parameters such as rates of mutation and cell division.

A comparison sets an explicit hypothesis: as a parameter changes, the model predicts a particular direction of change for the age-specific incidence curve. For example, an inherited mutation may reduce by one the number of stages that must be passed during progression. Mathematical models predict that fewer stages cause the incidence curve to have a lower slope and to shift to earlier ages (higher intercept). I will show data that support this comparative prediction.


One can fit theory to observation, but the match usually arises because a model with several parameters creates a flexible manifold that conforms to the data. Even when one constrains parameter estimates to realistic values, an incorrect model with several parameters often has great flexibility to conform to the shape of the data. A fit is achieved so easily that such a model, fitting widely and well, actually explains very little. As Dyson (2004) tells it:

In desperation I asked Fermi whether he was not impressed by the agreement between our calculated numbers and his measured numbers. He replied, "How many arbitrary parameters did you use for your calculations?" I thought for a moment about our cut-o3 procedures and said, "Four." He said, "I remember my friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk." With that, the conversation was over.

Several mathematical methods test the quality of a fit. But technical fixes do not overcome the main difficulty: mathematical models fail to capture the full complexity of multidimensional problems such as cancer. If a model does become sufficiently complex, one has so many parameters that fitting almost anything is accomplished too easily.

Although a good fit means little, a lack of fit also provides little insight: lack of fit means only that one does not have exactly the right model. However, one rarely has exactly the right model. So, by lack of fit, one may end up rejecting a theory that in fact captures much of the essential nature of a process but misses one aspect.

Finally, another common approach considers the realism of parameter estimates obtained from the data. For example, when fitting a model, how close do the estimated mutation rates match values thought to be realistic? However, parameter estimates can only be compared to realistic values when one has a complete model. In incomplete models, the parameter estimates change to make up for processes not included in the model. So the realism of parameter estimates provides a test only when fitting a complete model that captures the full complexity of a process. But for cancer and for most interesting biological phenomena, we do not have complete models and probably never will have complete models.

Models do have great value in spite of the difficulties of drawing conclusions by fitting to the data. The key is to develop and test theories in a comparative way.


A comparison is simple to formulate, understand, and test. Consider the following prediction: as the number of steps in progression declines, the slope of the incidence curve decreases. To test this, one has to measure a relative change in the number of steps and a relative change in the slope of the incidence curve. This test can be accomplished by comparing the incidence curves between genotypes, where one genotype has a mutation that abrogates a suspected rate-limiting step in progression.

A comparative prediction allows tests of causal hypotheses. If I understand what causes cancer, then I can predict how incidence curves change as I change the underlying parameters of cancer dynamics.

The limited role of mathematics and quantitative studies in much of biology follows from a fatal attraction to fitting complex models. Simple comparative models are often rejected a priori because they do not contain all known processes. The reasoning seems to be: how can a model be useful if a known process is left out? All known processes are added in; fits are obtained; little is learned; quantitative analysis is abandoned.

A model is not a synthesis of all known observations; a model is a tool to test one's ability to predict the behavior of a system. If one cannot say how the system changes when perturbed, then one does not understand the system. To study perturbations most effectively, formulate and test the simplest comparative theories.

5.2 Observations to Be Explained

In this section, I briefly list a few puzzles—just enough to set the context. Chapter 2 provided a more complete review of the observations on age-specific incidence.

The diffierence in incidence curves between inherited and sporadic cancers provides the most striking observation (Knudson 1971, 2001). In the simplest case, the inherited form of a cancer arises in those who carry a defect in a single allele. For example, a carrier with a mutant APC allele typically develops numerous independent colon tumors in midlife. By contrast, sporadic (noninherited) cases mostly occur later in life.

The comparison between inherited and sporadic incidence curves presents an opportunity to test how particular mutations affect the rate of cancer progression. Figure 2.6 compares incidence data between sporadic cancers and inherited cancers in carriers of a mutation to a single allele. Comparison of incidence curves between experimentally controlled genotypes of rodents provides an exceptional opportunity to test hypotheses. Figure 2.7 illustrates the sort of data that can be obtained. Later, I will provide methods to analyze those data with regard to quantitative models of progression dynamics.

Six additional patterns in the incidence data suggest the kinds of puzzles that dynamical theories of progression must explain.

First, incidence accelerates slowly with age for some cancers, such as melanoma, thyroid, and cervical cancers. By contrast, other cancers accelerate more rapidly with age, such as colorectal, bladder, and pancreatic cancers (Figure 2.3).

Second, the acceleration of cancer incidence with age declines at later ages for the common epithelial cancers—breast, prostate, lung, and colorectal (Figure 2.3). Several other cancers also show a steady and sometimes rather sharp decline in acceleration at later ages. In some cases, the patterns of acceleration differ between countries (see Appendix). On the whole, declines in acceleration later in life appear to be typical for many cancers.

Third, several cancers show very high early or midlife accelerations, sometimes with accelerations at early ages rising to a midlife peak (Figure 2.3). For example, prostate cancer has an exceptionally high midlife peak (Figure A.2); leukemia (Figure A.6) and in some cases colon cancer (Figure A.4) show rises in early life.

Fourth, smokers who quit by age 50 have a lower acceleration in lung cancer risk later in life than do those who never smoked or who continue to smoke (Figure 2.8).

Fifth, exposure to a carcinogen often causes the median number of years to tumor formation to decline linearly with dosage when measured on log-log scales (Figures 2.10, 2.11).

Sixth, given a set of individuals who have suffered breast cancer at a particular age, the close relatives of those individuals have high and nearly constant annual risk (zero acceleration) for breast cancer after the age at which the affected individuals were diagnosed. By contrast, individuals whose relatives have not suffered breast cancer have lower risk per year, but their risk accelerates with age (Peto and Mack 2000).

These observations provide a sample of interesting puzzles, most of which have yet to be explained in a convincing way. Dynamical models of cancer progression provide the only source of plausible hypotheses to explain the range of observed patterns.

5.3 Progression Dynamics through Multiple Stages

Models of progression dynamics analyze transitions through stages. The simplest type of model follows progression through a linear sequence. This linear model arose over 50 years ago, when people first observed clear patterns in the age-specific incidence of cancer.

Figure 5.1 illustrates the type of pattern that was apparent to early observers: the incidence of colorectal cancer increases in a roughly linear way with age when plotted on log-log scales. In an earlier chapter, Figure 2.2 showed that log-log plots of incidence are approximately linear for many cancers.

Figure 5.1. Age-specific incidence for colorectal cancer.

Figure 5.1

Age-specific incidence for colorectal cancer. Data for all males from the SEER database ( using the nine SEER registries, year of diagnosis 1992–2000.

The line in Figure 5.1 fits a model in which

Image ch5e1.jpg
where I is cancer incidence at age t, the exponent n−1 determines the rate of increase in cancer incidence with age, and c is a constant. Taking the logarithm of both sides of this equation gives the log-log scaling shown in the figure
Image ch5e2.jpg
in particular, the figure plots log(I) versus log(t). The line in Figure 5.1 has a slope of n−1 ≈ 5.

The linear rise on log-log scales means that incidence is increasing exponentially with age in proportion to tn−1. In the early 1950s, several authors wondered what might explain this exponential rise in incidence with age (Frank 2004c; Moolgavkar 2004).

Fisher and Hollomon (1951) recognized that cancer incidence would increase as tn−1 if transformation required n independent steps. The argument is roughly as follows. Suppose each step happens at a rate of u per year, where u is a small rate. The probability of any step having happened after t years is 1 − eutut. At age t, the probability that n − 1 of the steps has occurred is approximately (ut)n−1, and the rate at which the final step happens is u, so the approximate rate (incidence) of occurrence at time t is proportional to untn−1.

Nordling (1953) and Armitage and Doll (1954) emphasized that the different steps may happen sequentially. There are n − 1! different orders in which the first n − 1 steps may occur. If we assume they must occur in a particular order, then we divide the incidence calculated in the previous paragraph, untn−1, by n − 1! to obtain the approximate value for passing n steps at age t as

Image ch5e3.jpg
Armitage and Doll (1954) developed this theory of sequential stages for the dynamics of progression—the multistage theory of carcinogenesis as illustrated in Figure 5.2.

Figure 5.2. Multistage model of cancer progression.

Figure 5.2

Multistage model of cancer progression. Individuals are born in stage 0. They progress from stage 0 through the first transition to stage 1 at a rate u0, then to stage 2 at a rate u1, and so on. Severe cancer only arises after transition to the final (more...)

This basic model provides a comparative prediction for the relative incidence of sporadic and inherited cancers (Frank 2005). Suppose that normal individuals develop sporadic cancer in a particular tissue after n steps. Individuals carrying a mutation develop inherited cancer after n − 1 steps, having passed one step at conception by the mutation that they carry. Using Eq. (5.1) for n steps versus n − 1 steps, the incidence ratio of sporadic to inherited cancers at any age t is

Image ch5e4.jpg
In Chapter 8, I will develop this comparative prediction and apply it to data from retinoblastoma and colon cancer. That application will show how a simple comparative theory can link the genetics of cancer progression to the age of cancer incidence.

5.4 Why Study Quantitative Theories?

An ordered, linear sequence leaves out many of the complexities of carcinogenesis. However, it pays to begin with this simple model, to understand all of its logical consequences, and to study how well that model can predict changes in incidence. Following on the simple model, we can begin to explore alternatives, such as parallel lines of progression in different cellular lineages or incidence aggregated over different pathways.

After I have analyzed the basic model, I will explore a range of more complex assumptions, because we need to understand the possible alternative explanations for observed patterns. Without broad conceptual understanding, there is a tendency to latch onto the first available explanation that fits the data without full consideration of reasonable alternatives. The theory I develop will run ahead of empirical understanding, but if used properly, this is exactly what theory must do.

Another issue concerns the definition of stages and rate-limiting steps. To address this issue, we must consider what we wish to accomplish with mathematical models. The models are tools, so we need be concerned only about defining stages and rate-limiting steps in ways that help us to achieve particular goals for particular problems.

Sometimes we may formulate a model in a very abstract, nonbiological way, for example, to study how variation in rates of transition between stages influences age-onset patterns. In this case, stages remain abstract notions that we manipulate in a mathematical model in order to understand the logical consequences of various assumptions. In other cases, we may try to match the definition of stages and rates to the biological details of a particular cancer. A stage may, for example, be an adenoma of a particular size, histology, and genetic makeup. A transition between stages may occur at the rate of a somatic mutation to a particular gene.

5.5 The Basic Model

Assume that cancer progression requires passage through n rate-limiting steps, each step moving through the sequence of tumor progression to the next stage. A step could, for example, be mutation to APC or p53, as in colorectal cancer progression. But for now, I just assume that such steps must be passed.

Not all changes during tumor development limit the rate of progression. A necessary change may happen very quickly following, for example, expansion of a precancerous tumor to a large size. Such a step is necessary for progression but does not limit the rate of progress, and so does not determine the ages at which individuals carry tumors of particular stages. I develop the basic theory under the assumption that whatever determines a rate-limiting step, tumor progression requires passing n such steps to develop into cancer. This section follows the derivations given in Frank (2004a).

I gave a picture of the basic model in Figure 5.2. That picture formally describes a set of differential equations. Because the picture and the equations present the same information, one may choose to focus on either. The equations are

Image ch5e5.jpg
Image ch5e6.jpg
Image ch5e7.jpg
where xi(t) is the fraction of the initial population born at time t = 0 that is in stage i at time t, with time measured in years. Usually, I assume that when the cohort is born at t = 0, all individuals are in stage 0, that is, x0(0) = 1, and the fraction of individuals in other stages is zero. As time passes, some individuals move into later stages. The rate of transition from stage i to stage i +1 is ui. The ẋ's are the derivatives of x with respect to t.

5.6 Technical Definitions of Incidence and Acceleration

Two ways to characterize age-onset patterns play an important role in analyzing cancer data and studying theories of cancer progression. Incidence is the rate at which individuals develop cancer at particular ages. Acceleration is the change in incidence rates. For example, positive acceleration means that incidence increases with age.

This section provides some technical details for the definitions of incidence and acceleration. One can get a rough idea of the main results without these details, so some readers may wish to skip this section and come back to it later.

Individuals who move into the final, nth stage develop cancer. They pass into the final stage at the age-specific incidence rate ẋ n(t), which is roughly the probability of developing cancer per year at age t. The age-specific incidence is the fraction of all individuals in the cohort who develop cancer for the first time at age t, which is the probability of developing cancer at age t divided by the fraction of individuals, S(t), who have not yet developed cancer by that age. In symbols, we write that the age-specific incidence is I(t) =n(t)/S(t).

The incidence, I(t), is the rate at which cancer cases accumulate at a particular age. I frequently refer to the acceleration of cancer, which is how fast the rate, I(t), changes at a particular age, t. The most useful measure of acceleration in multistage models scales incidence and time logarithmically (Frank 2004a, 2004b).

Use of logarithms provides a scale-free measure of change. In other words, differences on a logarithmic scale summarize percentage change in a variable independently of the value of the variable. This can be seen by examining the derivative of the logarithm for a variable x, which is

Image ch5e8.jpg
The right side is the change in x divided by x, which measures the fractional change in x independently of how large or small x is.

For example, if we wanted to measure the percentage increase in the age-specific incidence for a given percentage increase in age, then we need to measure in a scale-free way changes in both age-specific incidence and age. We obtain a scale-free measure by defining the log-log acceleration (LLA) at age t as

Image ch5e9.jpg
The derivative of incidence, dI(t)/dt, is the age-specific acceleration, so LLA is just a normalized (nondimensional) measure of age-specific acceleration.

5.7 Summary

This chapter introduced the quantitative tools needed to build models of cancer progression. Such models make predictions about how particular genetic or physiological changes alter age-specific incidence. The ability to make such predictions successfully defines a causal understanding of cancer. The next chapter begins my mathematical analysis of the ways in which particular causes affect age-specific incidence.