NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Frank SA. Dynamics of Cancer: Incidence, Inheritance, and Evolution. Princeton (NJ): Princeton University Press; 2007.

Cover of Dynamics of Cancer

Dynamics of Cancer: Incidence, Inheritance, and Evolution.

Show details

Chapter 11Inheritance

Cancer progresses by the accumulation of heritable changes in cell lineages. In the simplest case, all of the changes happen to the DNA of a single somatic cell lineage. Starting with the initial cell, the carcinogenic process develops through the sequential addition of genetic changes that eventually gives rise to the tumor.

Many cancer biologists rightly object to this oversimplified view. The heritable changes may often be epigenetic—genomic changes other than DNA sequence—or physiological changes that persist (inherit) for many cell generations. Changes may happen to multiple lineages, with carcinogenesis influenced by positive feedback between altered lineages. But even this richer view still comes down to heritable changes in cell lineages—almost necessarily so, because cells are the basic units, and persistent change means heritable change. Disease arises at the level of tissues, but the causes derive from changes to cells.

The first heritable carcinogenic changes may trace back to a somatic cell that descended from the zygote, in which case the changes derive purely from the somatic history of that organism. Or the origin of a particular inherited variant may trace back to a germline cell in one of the individual's ancestors, in which case the inherited variant may be shared by other descendants.

All of these descriptions turn on heritable change in lineages, that is, on evolutionary change. Cancer has long been understood in terms of somatic evolution within an individual's cellular population. More recently, the role of inherited germline variants has been studied in terms of the evolutionary genetics of populations of individuals.

We can think about any particular variant, somatic or germline, in two ways. First, the variant influences disease through its effect on progression—the role of development that traces cause from genes to phenotypes. Second, the phenotype influences whether, over time, the variant lineage expands or goes extinct—the role of natural selection in shaping the distribution of variants.

The following chapters focus on variants that originate in somatic cells: in a particular cell, variants trace their origin back to an ancestral cell that descended from the most recent zygote. Somatic variants drive progression within an individual.

This chapter focuses on germline variants that may occur in different individuals in the population: in a particular cell, germline variants trace their origin back to an ancestral cell that preceded the most recent zygote. Germline variants determine inherited predisposition to cancer.

The first section describes how inherited variants affect progression and incidence—the causal pathway from genes to phenotypes. A classical Mendelian mutation is a single variant that strongly shifts age-onset curves to earlier ages. Such mutations demonstrate the central role of inherited variation in progression and the multistage nature of carcinogenesis. Other inherited variants may only weakly shift age-onset curves; however, the combination of many such variants predisposes individuals to early-onset disease.

The second section turns around the causal pathway: the phenotype of a variant—progression and incidence—influences the rate at which that variant increases or decreases within the population. The limited data appear to match expectations: variants that cause a strong shift of incidence to earlier ages occur at low frequency; variants that cause a milder age shift occur at higher frequencies; and variants that only sometimes lead to disease occur most frequently.

The final section addresses a central question of biomedical genetics: Does inherited disease arise mostly from few variants that occur at relatively high frequency in populations or from many variants that each occur at relatively low frequency? The current data clarify the question but do not give a clear answer. Inheritance of cancer provides the best opportunity for progress on this key question.

11.1 Genetic Variants Affect Progression and Incidence

The first studies measured differences in progression and age of onset between variants at a single locus. Those first studies aggregated all variants into two classes, wild type and mutant, and compared incidences between those classes. Current studies measure differences at a finer molecular scale, distinguishing between variants at a particular nucleotide or amino acid site, or between variants that differ by single insertions or deletions. Ultimately, one would like to know how variants at multiple sites combine to affect incidence. So far, most studies have been limited to indirect analysis of multiple sites by associations between familial relationships and incidence, the classical nonmolecular approach to quantitative inheritance.

Variants at a Single Locus

This section compares progression and incidence between individuals who carry, at a single locus, either the wild-type allele or a loss of function mutation. In most cases, one compares homozygotes for the wild type and heterozygotes that carry one wild-type and one loss of function mutation. In practice, "wild type" means the class of all variant alleles that do not have a large effect on incidence, and "loss of function" means the class of all variant alleles that cause a large increase in the rate of progression.

The comparison between individuals carrying wild-type and loss of function genotypes played a key role in the history of multistage theories of carcinogenesis. The shift of the incidence curve to earlier ages in the loss of function genotypes provided the first direct evidence that mutations in cell lineages affect progression. The observed magnitude of the shift in incidence curves matched the expected shift under multistage theory. In that theory, progression follows the accumulation of multiple genetic changes, and the inherited mutation provides the first of two or more steps in carcinogenesis.

In earlier chapters, I described studies that compared age incidence between genotypes that differed at a single locus, comparing the wild-type with loss of function mutations. In this section, I copy the figures from two earlier examples. The following sections provide new examples.

Figure 11.1 compares incidence rates between inherited and sporadic cases of retinoblastoma. In the inherited cases, individuals carry one mutated allele at the retinoblastoma locus. Within the multistage framework, inheriting a key mutation means being born one stage advanced in progression. The theory predicts that an advance by one stage reduces the slope of the incidence curve by one. The difference in the log-log acceleration (LLA) of the two incidence curves measures the difference in the slopes of the incidence curves. Figure 11.1c shows that the observed difference in slopes is close to one, matching the theory's prediction.

Figure 11.1. Age-specific incidence of bilateral and unilateral retinoblastoma.

Figure 11.1

Age-specific incidence of bilateral and unilateral retinoblastoma. Bilateral cases are mostly inherited, and unilateral cases are mostly sporadic. (a) Bilateral (solid line) and unilateral (dashed line) incidence of retinoblastoma per 106 population, (more...)

Figure 11.2 compares incidence rates between inherited and sporadic cases of colon cancer. In the inherited cases, individuals carry one mutated allele at the APC locus. Again, the multistage framework predicts that an inherited mutation in a key rate-limiting process advances progression by one stage and therefore reduces the log-log acceleration of incidence by one. Figure 11.2c shows a difference in LLA of about 1.5, a reasonable match to the theory's prediction given the sample sizes and complexities of progression.

Figure 11.2. Age-specific incidence of inherited familial adenomatous polyposis (FAP) and sporadic colon cancer.

Figure 11.2

Age-specific incidence of inherited familial adenomatous polyposis (FAP) and sporadic colon cancer. (a) Inherited colon cancer (FAP) caused by mutation of the APC gene (top curve) and sporadic cases (bottom curve) per 106 population, shown on a log10 (more...)

Common Variants at a Single Site

The previous section described studies that aggregated variants into wild-type and mutant classes. This section presents two cases in which mutations at specific sites define the variants.


Struewing et al. (1997) screened Ashkenazi Jewish females for two specific mutations in BRCA1 and one specific mutation in BRCA2. They obtained age of breast cancer onset among the 89 carriers and 3653 noncarriers. They used a statistical procedure that accounted for relatedness between certain sample members to obtain estimates for the risk of breast cancer, measured as the expected fraction of women at each five-year age interval who would be expected to develop cancer by that age.

In Figure 11.3a, the circles plot their estimates, shown as the fraction who would be expected not to have developed a breast tumor by each age. The solid curve provides a smoothed fit to the carrier class; the dashed curve provides a smoothed fit to the noncarrier class.

Figure 11.3. Breast cancer rates for females who carry a mutation in BRCA1 or BRCA2, shown as solid lines, versus those females who do not have a mutation, shown as dashed lines.

Figure 11.3

Breast cancer rates for females who carry a mutation in BRCA1 or BRCA2, shown as solid lines, versus those females who do not have a mutation, shown as dashed lines. The circles in (a) and (c) mark the estimated fraction of females in each class that (more...)

In the data from Struewing et al. (1997), the estimated fraction tumorless sometimes increases from one age to a later age. Such increases are, of course, not possible in the actual fraction tumorless curves. The increases arise because of the estimation procedure. I mention this because the rise and fall in the estimates (shown as circles) at later ages causes the curves to be particularly sensitive to the smoothing parameters. For these reasons, and the moderately small sample of carriers, these data only illustrate various ways in which to analyze such problems.

With current technological trends, we will eventually have vastly more data of this kind. At present, I focus mainly on exploratory analysis to highlight some interesting hypotheses, which will require further studies to test.

Hypothesis 1: All carriers do not have highly elevated risk.—

The second row of panels in Figure 11.3 plots the standard log-log incidence curves for carriers and noncarriers. In all four panels, the noncarriers (dashed curve) show the commonly observed pattern for sporadic breast cancer: a diminishing slope of incidence with age, but little or no actual decrease in the incidence rate before age 80. By contrast, the incidence declines after midlife for the carriers (solid curves) in all of the panels except panel (h). I work through the steps that lead to panel (h). As I mentioned, I do not regard these manipulations as tests of any hypothesis, but rather as ways to generate new hypotheses.

Panel (e) shows the direct estimate of carrier incidence using the original values of Struewing et al. (1997) and the standard smoothing parameter of 0.5 for fitting the curves in panel (a). In (e), carrier incidence declines strongly and steadily after about age 55. In (f), I considered the possibility that only a fraction of carriers have highly elevated risk. The division of carriers into very high risk and moderate risk categories may arise from genetic predisposition caused by other loci. I discuss evidence for this idea in following sections; here I just look at the consequences.

The estimated fraction of carriers who develop cancer by age 80 is about 0.66. What if nearly all carriers with highly elevated risk develop cancer? Suppose, for example, that only a fraction max = 0.7 of carriers have elevated risk, and nearly all of them develop cancer. Then the fraction tumorless among the class with highly elevated risk is S = (maxf)/max, where f is the fraction tumorless among all carriers. Panels (b) and (d) show the fraction tumorless among carriers with highly elevated risk, using max = 0.7. Panel (f), derived from (b), has a carrier incidence curve that drops later in life, but less strongly than in (e).

Panel (h), derived from (d), has what I consider to be the right shape for the carrier incidence curve. The difference between (h) and (f) comes only from the smoothing parameter used to fit the curves in the top row. Whenever a key match to expectations arises only from a moderate change in the smoothing parameter, one clearly does not have enough data to draw any conclusions. Normally, after seeing such a pattern, I would suggest not presenting such an analysis. I present it here to warn about the importance of sample size and sensitivity to smoothing procedures, and because I think the alternative biological interpretations are sufficiently interesting to stimulate further work.

In summary, I suggest that the estimated incidence curve in (h), based on the stiffer smoothing method, comes closer to the actual incidence pattern. More importantly, I propose that, among carriers, only a fraction have highly elevated risk. I will discuss below two ways in which background genotype may elevate risk in some BRCA mutant carriers.

Hypothesis 2: BRCA mutations abrogate a rate-limiting step.—

An inherited mutation may increase incidence in at least two different ways.

First, an inherited mutation may raise the rate of somatic mutations, including epigenetic and chromosomal changes. In this case, the inherited mutation may not abrogate a rate-limiting step, but instead increase the transition rates between the normal rate-limiting steps that characterize carcinogenesis in the absence of the mutation. If so, then the theory predicts a rise with age in the difference between the log-log slopes of incidence (LLA) for sporadic versus inherited cases. (See Eq. (7.6) and Figures 7.5 and 7.6.)

Second, an inherited mutation may directly or indirectly abrogate a single rate-limiting step. In this case, the theory in Eq. (7.5) predicts that ΔLLA ≈ 1 and does not change much with age.

The bottom row of Figure 11.3 shows a range of patterns for ΔLLA. In panel (i), the value rises strongly with age; in panel (l), the value remains mostly flat and near one. The two middle panels follow intermediate trends. We do not know enough yet to assign significantly higher likelihood to one pattern over the others because of: the limited sample size for inherited cases; the fluctuations in the fraction tumorless caused by the estimation procedure in the original paper; and the uncertainty with regard to the fraction of carriers who have elevated risk.

I favor the right column of panels in Figure 11.3, because the incidence pattern for carriers has the common shape for breast cancer, in which incidence plateaus later in life but does not decline significantly before age 80. The right column matches the prediction for a BRCA mutation to knock out one rate-limiting step. To test that hypothesis, we need more data on incidence in carriers and on the fraction of carriers who have highly elevated risk.


p53 is the most commonly mutated gene in tumors. In some tumors, mutations arise in those genes that regulate p53 rather than in p53 itself.

To search for new inherited variants that affect the p53 system and cancer, Bond et al. (2004) focused on MDM2, a direct negative regulator of p53. They found a single nucleotide polymorphism in the MDM2 promoter that enhanced MDM2 expression and attenuated the p53 pathway. In particular, the variant had a T → G change at the 309th nucleotide of the first intron (SNP309). This SNP occurred at high frequency in a sample of 50 healthy individuals: heterozygote T/G at 40% and homozygote G/G at 12%.

A variant affects cancer to the extent that it shifts the age-onset curve to earlier ages. To measure the variant's effect, Bond et al. (2004) studied a group that suffered soft tissue sarcoma (STS) and had no known p53 or other predisposing inherited mutations.

The data collected by Bond et al. (2004) show that the variant allele shifts age of onset to earlier ages, supporting the hypothesis that the variant's increased expression of MDM2 enhances tumor progression. However, Bond et al.'s (2004) particular quantitative analyses misuse the data and the theory of multistage progression. I demonstrate proper analysis, because this study provides just the sort of combined genetic, functional, and population level insight that will be required to move the field ahead.

Figure 11.4a,b presents copies of Figure 7C,E from Bond et al. (2004). Panel (a) compares age of onset for all soft tissue sarcomas between the wild type (T/T) and the homozygote variant (G/G). The wild type progresses at a median age of 59 compared with a median of 38 for the homozygote variant, showing the earlier onset for the variant.

Figure 11.4. Onset of soft tissue sarcoma for individuals classified by genotype at a single nucleotide polymorphism in the promoter region of MDM2.

Figure 11.4

Onset of soft tissue sarcoma for individuals classified by genotype at a single nucleotide polymorphism in the promoter region of MDM2. At the polymorphic site, individuals are wild type (T/T), heterozygote for the variant allele (T/G), or homozygote (more...)

In the sample collected by Bond et al. (2004), liposarcomas form the largest subset of soft tissue sarcomas. Figure 11.4b shows how Bond et al. (2004) fit curves to the onset data for liposarcoma in order to estimate the number of rate-limiting steps in progression for each genotype. They assumed that the y axis measured incidence, and fit I(t) = ktn−1 (they used r instead of n for the number of rate-limiting steps). From their fitting procedure, they estimated n as 4.8 for the wild type (T/T, solid curve), 3.5 for the heterozygote (T/G, dashed curve), and 2.5 for the homozygote variant (G/G, dot-dash curve). These estimates differ by about one, so the authors concluded that the variant abrogates one rate-limiting step in progression. I do not know whether the biological conclusion is correct, but the analysis of the data is inappropriate.

The y axis of Figure 11.4b measures the percentage of individuals of a particular genotype who have suffered cancer by a particular age. That measure differs from incidence. I have shown previously that such data can be transformed into incidence. Let y be the percentage of individuals with cancer by age t, as on the y axis of Figure 11.4b. Then the fraction tumorless is S = 1 − y/100, where the 100 arises because y is given as a percentage. Incidence is I(t) = −d ln(S)/dt.

To study the curves in relation to the number of rate-limiting steps, n, we can use the form applied by Knudson (1971), ln(S) = −k1tn, where k is a constant, or, differentiating ln(S) with respect to t, we can use incidence, I(t) = k2tn−1. I discussed in earlier chapters the theory behind these equations.

If I were to analyze the data in Figure 11.4b, I would highlight two issues before starting. First, there are only four individuals in the variant homozygote (G/G) sample. One will not get a reliable estimate of a rate (incidence) from four observations. Second, the median age of onset is nearly identical for the wild type (T/T) and the heterozygote (T/G). Median age of onset often provides a good measure for the rate of progression as, for example, in the classical Druckrey analysis of chemical carcinogens (see Section 2.5). With nearly identical medians for those two genotypes, I would not be inclined to put much weight on any estimated differences in the slopes of the incidence curves, unless I had reason to believe that one genotype had both more rate-limiting steps and a faster transition rate between steps than the other genotype. In this study, those assumptions would over-interpret the data.

Given these issues with regard to the data analysis of Figure 11.4b, I would be content to note that the direction of shift in the homozygote variant (G/G) is consistent with enhanced progression.

I have emphasized data interpretation because the work of Bond et al. (2004) is just the sort of study that will become increasingly common and important as genomic technology improves. I agree with the authors that the analysis of inherited variants comes down to understanding how those variants affect age of onset. Further, the quantitative aspects of rates could, in principle, provide insight into the mechanisms by which variants influence the complex process of progression. With the inevitably larger samples that will soon be available, it should be possible to accomplish such analyses with much greater ease and power.

Interaction between Variants at Different Sites

Variants at different nucleotide sites may interact to influence progression. Studies to date have generally not had sufficient resolution and sample sizes to demonstrate the joint effects of different variants on age-incidence patterns in human populations. The work of Bond et al. (2004) discussed in the previous section provides a glimpse of the sort of study that will become common in the future.

In the previous section, I described how MDM2 acts as a negative regulator of p53. Bond et al. (2004) showed that a nucleotide variant in the promoter of MDM2 enhances expression of the MDM2 protein and thus negatively influences the p53 regulatory system. In individuals with a normal p53 locus, the MDM2 promoter variant enhances progression of soft tissue sarcomas, the same type of cancer often found in individuals who inherit p53 defects.

Bond et al. (2004) extended their study to samples that included individuals who carry both the MDM2 promoter variant and a mutation in p53. Those double mutant individuals suffered faster progression than individuals who inherited only one of the two mutations. If we use + and − superscripts to label the wild type and variant, then the ordering of the median age of onset was MDM2/p53 < MDM2+/p53 < MDM2/p53+ < MDM2+/p53+, with values for the medians of 2 < 14 < 38 < 57.

The MDM2 variant alone shifts the median from 57 in the wild type to 38; the p53 variant alone shifts the median from 57 in the wild type to 14. In this case, either variant by itself causes significantly enhanced progression. In other cases, a variant by itself may have little effect in the absence of a synergistic variant at another site.

Comparison between Rare Variants at Single Sites

Technical advances in DNA sequencing efficiency provide an opportunity to study individual nucleotide variants. Ideally, one would like to associate nucleotide variants to their consequences for cancer, measured by the age of cancer onset. However, each particular variant often occurs only rarely in natural populations, so it may be difficult to compare the age of onset between those individuals with and without the variant. In addition, many amino acid substitutions may have a weak effect on biochemical function, whereas a few substitutions may have a strong effect. Some a priori way of weighting the expected effects of particular substitutions would greatly enhance the association between DNA sequence variants and their consequences for cancer onset.

The association between the nucleotide sequence of DNA mismatch repair genes and colorectal cancer has been the focus of many recent studies. In those studies, each observed human subject provides an age of cancer onset and information about variant nucleotide sites or amino acid substitutions in the mismatch repair genes. The two problems mentioned above arise when analyzing the data from those studies: each particular variant occurs rarely, and some method must be used to weight the expected consequences of a substitution.

To solve these problems, various computational methods predict the expected functional consequences of amino acid substitutions. One method examines the evolutionary history of a gene, and weights more heavily those substitutions that occur rarely across different species (Ng and Henikoff 2003). The idea is that relatively rare changes must often be more constrained by functional consequences of substitutions, whereas relatively common changes must often have relatively few deleterious consequences. Another method, polymorphism phenotyping (PolyPhen), combines evolutionary conservation with various measures of biochemical structure and function (Ramensky et al. 2002).

I obtained two unpublished collections of PolyPhen scores for mismatch repair gene variants and the associated ages of colorectal cancer onset. Figure 11.5 presents a preliminary analysis of those data. I particularly wish to emphasize the importance of using the full age of onset data. Many analyses simply classify age of onset as early or late, throwing out the most valuable quantitative aspect of outcome. I have emphasized throughout this book that age of onset provides the summary measure of outcome when studying how various causal factors influence cancer progression.

Figure 11.5. Association between cancer onset and the predicted functional consequences of amino acid substitution in DNA repair genes measured by the PolyPhen score.

Figure 11.5

Association between cancer onset and the predicted functional consequences of amino acid substitution in DNA repair genes measured by the PolyPhen score. (a) A data set of 78 individuals culled from the literature, in which each paper reported the age (more...)

Figure 11.5a shows the association between single amino acid substitutions and age of onset. These data came from a survey of the literature, in which each publication usually reported a single amino acid variant believed to influence mismatch repair function and age of cancer onset. These confirmed variants form a generally accepted set of DNA repair variants with functional consequences on which we could test the efficacy of the PolyPhen scoring method.

The raw data for Figure 11.5a scatter widely, because so many factors influence the age of cancer onset for each individual case. I used a sliding window analysis to illustrate the strong trend in the data (see figure legend). The result shows a clear tendency for increased PolyPhen score to predict the association between a substitution and the rate of cancer progression measured by age of onset.

The confirmed variants in Figure 11.5a generally had some independent evidence that suggested functional consequence for DNA repair and cancer. If PolyPhen does indeed provide a computational method for predicting consequence, then the method should also work on nucleotide sequences obtained without any a priori information about the functional consequence of variant sites.

Figure 11.5b shows unpublished data collected from individuals for whom early-onset colorectal cancer runs in their family. For each individual, I received the age of colorectal cancer onset and the average PolyPhen score over all 34 variant amino acid sites in the data set. I excluded 26 individuals who did not have any variants and so did not have a predictive PolyPhen score. The remaining 62 individuals each had one or a few variant sites. The sliding window analysis in Figure 11.5b demonstrates the predictive power of the PolyPhen scoring for age of onset. In this case, the variants were collected blindly with regard to prior knowledge about the functional consequences of particular amino acid substitutions.

Many factors influence age of onset, so the PolyPhen scoring on single variants will provide only a small amount of information about predicted risk and age of onset. The value of the analysis may come from hypotheses about which amino acid sites and which kinds of biochemical function affect DNA repair efficacy, and how those changes in efficacy influence cancer progression. Such hypotheses could be tested in laboratory animals, in which one could construct genotypes with particular amino acid substitutions.

Combined Effect of Variants at Multiple Sites

Cancer often aggregates in families, suggesting a strong inherited component that predisposes individuals to disease. In two well-studied cancers, breast and colon, only about 10–20% of the inherited component can be explained by known variants (Anglian Breast Cancer Study Group 2000; de la Chapelle 2004). Those known variants include BRCA1 and BRCA2 for breast cancer and APC and the mismatch repair genes for colon cancer. Each of those variants causes a large change in the incidence curve. The large effect of such variants makes them relatively easy to study: compare the incidence curves between genotypes with and without the variant. A small sample provides sufficient power to observe the large effect.

Many other variants, each with small effect on incidence, may also occur. However, finding such variants is difficult. One must first identify a candidate variant, and then compare incidence between genotypes with and without the variant in large samples. Such studies remain beyond what can easily be accomplished, even with advancing technology.


In the absence of direct knowledge about many genes that predispose to cancer, statistical studies have analyzed how environmental and genetic variation contribute to differences in cancer risk. For example, reflecting environmental effects, immigrants take on the risk of colon cancer that is specific for their new home (Haenszel and Kurihara 1968). The risk of developing colon cancer for an individual in a specific geographical region is strongly associated with levels of meat consumption (Armstrong and Doll 1975), so changes in diet might explain the altered risk of immigrants. Smoking (Doll 1998; Vineis et al. 2004) and long-term exposure to certain carcinogens (Vineis and Pirastu 1997) also cause significant environmental risk.

To determine the genetic component of risk, statistical studies compare the frequencies of cancer occurrence between monozygotic twins, dizygotic twins, other family members, and unrelated individuals (Lichtenstein et al. 2000). In principle, such studies could separate the contributions of shared genes, shared environment in the family, and differences in environment between unrelated individuals. However, the statistical power of such studies tends to be low, with wide confidence intervals for the relative roles of genes and environment. This problem is particularly severe for the rarer cancers because of low sample sizes in such studies.

A large study from the Swedish Family-Cancer database provided narrower confidence intervals for the proportions of cancer variance that are explained by genes and environment (Czene et al. 2002). The estimates for genetic contribution ranged from 1% to 53%, depending on the type of cancer. These values may be lower limits, because certain types of genetic variation could not be separated from the effects of a shared environment. Confounding components include similar genotypes between parents, which would be classed as a shared environmental effect rather than a genetic effect. In this study, Mendelian loci explain only part of the total genetic contribution to cancer risk, indicating a significant role for polygenic variation.

An interesting analysis of the Anglian Breast Cancer Study Group study took a different approach to genetic predisposition (Pharoah et al. 2002). The authors first removed the two known Mendelian loci associated with breast cancer—BRCA1 and BRCA2—from the analysis, and then fitted the remaining risk distribution to a polygenic model in which the small risks per variant allele are multiplied across loci. According to the fitted model, the 20% of the population that has the highest level of genetic predisposition has a 40-fold greater risk than the 20% of the population with the lowest level of predisposition. The model also predicted that more than 50% of breast cancers occur in the 12% of the population with the greatest predisposition. The known Mendelian loci account for only a small proportion of the total genetic risk, with the remainder being explained by polygenic variation.

It is difficult to tell how reliable those conclusions are about polygenic inheritance. Other models could be fit to the same data, with different contributions of Mendelian loci, polygenic loci, and environment. I favor the strong emphasis on polygenic inheritance, because most complex quantitative traits in nature show extensive polygenic variation (Barton and Keightley 2002; Houle 1992; Mousseau and Roff 1987). However, statistical models are hard to test directly, because it is difficult to obtain evidence that strongly supports one model and rules out other plausible models. One is often left with conclusions that are based as much on prior belief as on data.


Ideally, one would like to know how particular genetic variants affect the biochemistry of cells, and how those biochemical effects influence progression to cancer. Although we are still a long way from this ideal, recent studies of DNA repair genes provide hints about what could be learned (Mohrenweiser et al. 2003).

Individuals vary in the ability of their cells to repair DNA damage (Berwick and Vineis 2000). A relatively low repair efficiency is associated with a higher risk of cancer. Presumably, the association arises because higher rates of unrepaired somatic mutations and chromosomal aberrations contribute to faster progression to cancer. Repair genes also play a role in sensing genetic damage and initiating apoptosis.

Most studies of repair capacity measure the effects of mutagens on DNA damage in lymphocytes. For example, a mutagen can be applied to cultures of lymphocytes; after a period of time, damage can be measured by the numbers of unrepaired single-strand or double-strand breaks, or by incorporation of a radioisotope. To study the role of DNA repair in cancer, measurements compare individuals with and without cancer. Berwick and Vineis (2000) summarized 64 different studies that used a variety of methods to quantify repair. In those studies, a relatively low repair capacity was consistently associated with an approximately 2–10-fold increase in cancer risk.

Roughly speaking, repair efficiency has an inheritance pattern that is typical of a quantitative trait. A few rare Mendelian disorders cause severe deficiencies in repair capacity. Apart from those rare cases, repair capacity shows a continuous pattern of variation and has a significant heritable component (Grossman et al. 1999; Cloos et al. 1999; Roberts et al. 1999). Measures of variability and heritability are statistical descriptions of the genetics of repair. Recent studies have made the first steps toward understanding the mechanistic relations between genetic variants and altered phenotypes.

Many genes in the five key repair pathways for different types of DNA damage are known (Bernstein et al. 2002; Thompson and Schild 2002; Mohrenweiser et al. 2003), so genetic variants can be identified by sequencing the loci involved. Specific variants can also be constructed, and their physiological consequences tested in cell-based assay systems. Mohrenweiser et al. (2003) list 22 genes in the core pathway of the MMR system. This system primarily corrects mismatches and short insertion or deletion loops that arise during replication or recombination (Hsieh 2001). The MMR system increases the accuracy of replication by a factor of 100–1,000.

Eighty-five different variants have been found in seventeen different MMR genes that were screened in at least fifty unrelated individuals (Mohrenweiser et al. 2003). Of those variants, 38% occurred at a frequency of 2% or more; 21% occurred at a frequency of 5% or more; and 12% occurred at a frequency of 20% or more. The other DNA repair pathways provided similar results, as summarized by Mohrenweiser et al. (2003). In 74 repair genes from various pathways, the average frequency of the wild-type allele is approximately 80%, with the remaining 20% comprised of different allelic variants. Among the 148 alleles per person at the 74 repair loci, the average number of allelic variants is expected to be approximately 30. Presumably, each individual carries a very rare or unique genotype.

In summary, small variations in DNA repair are highly heritable, DNA repair efficiency is correlated with cancer risk, and there are widespread amino acid polymorphisms in the known repair genes. The next step will be to link those polymorphisms to variations in the biochemistry of repair, providing a mechanistic understanding of how genetic variation influences an important aspect of cancer predisposition (de Boer 2002).


The polymorphisms that occur in DNA repair genes hint at variations in cellular physiology that may be very common. The connection between DNA repair efficiency and cancer seems plausible, because somatic mutations and chromosomal aberrations probably have a key role in cancer progression. However, at present, we cannot make a simple mechanistic connection between repair efficacy and the rate of progression to cancer.

Currently, the most interesting studies of multisite variants and age-specific incidence link aggregation of cases in families to age of onset. Presumably, familial cases that rule out known major single-site variants arise from multisite variants shared by relatives.

Peto and Mack (2000) noted that women who are at high risk of developing breast cancer show an approximately constant incidence of cancer per year after a certain age, whereas in most individuals incidence rises significantly with age (Figure 11.6). This pattern appears in three different classes of susceptible individuals after the age at which a particular patient develops cancer. I refer to the individual who first has cancer as the patient or the index case, and the age of this first diagnosis as the index age.

Figure 11.6. Schematic summary of breast cancer incidence in individuals with varying levels of relatedness to an index case.

Figure 11.6

Schematic summary of breast cancer incidence in individuals with varying levels of relatedness to an index case. Redrawn from Peto and Mack (2000).

In the first class, an index case with monolateral breast cancer has an annual risk of developing cancer in the other (contralateral) breast of approximately 0.7% per year after the index age. A different study found a similar result, with risk in the contralateral breast of about 0.5% per year after the initial cancer (Figure 11.7).

Figure 11.7. Incidence of cancer in the contralateral breast after the first primary breast cancer, excluding cases in which the contralateral cancer was diagnosed within three months of the first cancer.

Figure 11.7

Incidence of cancer in the contralateral breast after the first primary breast cancer, excluding cases in which the contralateral cancer was diagnosed within three months of the first cancer. Incidence per year shown on a linear scale per 100,000 population. (more...)

In the second class, a monozygotic twin of an index case has an approximate risk of 1.3% per year after the index age, which is again approximately 0.7% per breast per year.

In the third class, mothers and sisters of an index case have a risk of approximately 0.3–0.4% per year after they have passed the index age.

Single locus mutations of large effect, such as BRCA1 or BRCA2, explain less than one-fifth of familial aggregation (Anglian Breast Cancer Study Group 2000). Thus, the patterns of high and nearly constant incidence most likely arise from familial inheritance of variants at multiple sites—polygenic inheritance.

The tendency for risk after the index age to remain nearly constant for the remainder of life raises an interesting puzzle: what causes that early plateau of incidence in highly susceptible individuals?


Peto and Mack (2000) concluded: "A … model that may account for these peculiar temporal patterns is that many, and perhaps most, breast cancers arise in a susceptible minority whose incidence, at least on average, has increased to a high constant level at a predetermined age that varies between families."

But why should predisposed individuals have constant annual risks after a certain age? Individuals who are not predisposed to breast cancer show an increasing risk with age, and the same is true for the other most common types of epithelial cancer when risk is measured in the absence of information about genetic predisposition.

Frank (2004d) proposed the following explanation for Peto and Mack's (2000) observations. Suppose, at birth, that each of L different cell lineages in the breast has n rate-limiting steps remaining before cancer. I have discussed previously that, as individuals age, their cell lineages may progress independently. Over time, the various lineages form a distribution of stages: some still have n stages remaining before cancer, others have progressed part way and have, for example, na stages remaining.

If some cell lineages in an individual have passed through all but the final stage in cancer progression, with only one stage remaining, then that individual's annual risk is constant—the risk is just the constant probability of passing to the final stage. Families that have an increased predisposition may progress through the first n−1 stages quickly; subsequently, their annual risk is the constant probability of passing the final stage. Families with low genetic risk move through the early stages slowly: in middle or late life, members of those families typically have more than one stage to pass and so continue to have an increasing rate of risk with advancing age.

If the early stages in cancer progression involve somatic mutations or chromosomal aberrations, impaired DNA repair efficiency could explain why families with increased predisposition move quickly through the early stages. When they have progressed through the early stages, individuals from those families have a high constant risk later in life while awaiting the final transition. By contrast, better repair efficiency slows the transition through the early stages. Slow transitions early in life mean more stages to pass through later in life. With more stages remaining, individuals at low risk continue to show an increase in incidence with age (Frank 2004d).

11.2 Progression and Incidence Affect Genetic Variation

The previous section described how genetic variants affect progression and incidence: the pathway from genes through development to phenotype. In this section, I analyze how progression and incidence affect the frequency of variants in populations: the pathway from phenotype through natural selection to gene frequency.

Evolutionary Forces

Many forces potentially influence gene frequency. The wide range of alternatives makes it easy to fit some model to the observed distribution of frequencies, but hard to determine if the fit has any meaning.

Only natural selection provides a simple comparative prediction: the stronger the deleterious effect of a cancer-predisposing variant on survival and reproduction, the lower the expected frequency of that variant. A comparative prediction forecasts the overall tendency or trend, not the relative frequency of any particular variant.

In this section, I summarize the major evolutionary forces. The following section evaluates the comparative prediction that the deleterious effects of a variant influence its frequency.


Drift encompasses various chance events. Each copy of a genetic variant lives an individual and descends, on average, to λ babies. Most populations neither grow nor shrink continually, and so the total number of gene copies remains about the same with λ ≈ 1. If the population shrunk in one generation to 10% of its current size, then λ = 0.1.

A few simple calculations illustrate the key role of drift for rare variants. Consider a population of size N with a particular variant at frequency p. In one generation, how much does p typically change if random drift is the only evolutionary force acting?

The number of copies of a particular variant is α = p2N, where N is the size of the population, and 2N is the total number of gene copies—the factor of 2 arises because each diploid individual carries two copies of each gene.

In the next generation, the number of variant gene copies follows a Poisson distribution with an average of αλ in a progeny gene pool of size 2Nλ. As long as αλ is not too small, we can use the normal approximation for the Poisson distribution, which tells us that the number of variant gene copies in the next generation approximately follows a normal distribution with mean αλ and standard deviation

Image ch11e1.jpg
. In terms of variant gene frequency p in the next generation, the 95% confidence interval is
Image ch11e2.jpg

How much does drift change gene frequency in one generation in a stable population, λ = 1? Suppose the gene frequency starts at p = 10−5 in a gene pool of size 2N = 107, so there are originally a = p2N = 100 variant gene copies. In the next generation, the frequency of the variant gene has a 95% confidence interval of p(1 ± 0.2), which shows that 5% of the time the gene frequency will change by more than 20% in one generation. Over relatively short time periods, significant changes in the frequency of rare variants may occur.


Consider a new variant that exists as a single copy in the population at frequency p = 1/2N. Suppose that focal variant resides on a chromosome near another site that has a rare, favorable variant. Let the only force acting on the focal variant be the benefit derived from residing near a favorable variant at a nearby site.

Suppose the neighboring site causes an average increase in reproduction of 1 + s compared with the normal value of one. Further, suppose the focal site and beneficial neighbor recombine at a rate of r per generation. Then the frequency of the focal site tends to increase if s > r, that is, if the selective benefit, s, of being linked to an advantageous allele is greater than the rate, r, at which that linkage is broken down by recombination. If the selective benefit happens to be fairly strong, then the beneficial site will significantly increase the frequency of all of the closely linked variants.


Many variants affect more than one phenotype or more than one component of survival and reproduction. Suppose, for example, that a variant enhanced the rate of wound healing. On the one hand, rapid healing would probably provide some benefit, perhaps against infection. On the other hand, wound healing can be carcinogenic probably because of the enhanced rate of symmetric mitoses, and more rapid wound healing may be more carcinogenic. So a variant that increased the rate of wound healing might rise to high frequency even though it shifts cancer incidence to earlier ages.

In general, when a variant shifts cancer to earlier ages and occurs at unexpectedly high frequency, pleiotropy is a reasonable hypothesis. However, it is often difficult to figure out the multiple effects of a variant and the respective consequences for survival and reproduction.


Overdominance occurs when, at a locus with two alternative alleles, the heterozygote is more fit than either homozygote. Sickle cell anemia provides the classic example. An individual with one sickle cell variant allele enjoys protection against malaria, but an individual with two copies of the variant suffers severe disease from aberrations in red blood cells. Those opposing benefits and costs influence the frequency of the sickle cell variant.

Overdominance probably occurs rarely for variants that directly cause significant shifts of cancer to earlier ages. Most carcinogenic variants act in a physiologically recessive way, such that a cell with one normal copy and one variant copy has a normal phenotype. Deleterious effects at the cellular level arise only when both allelic copies suffer loss of function. However, an individual needs to carry only one mutated copy to be at risk; the cancerous phenotype arises after somatic mutation knocks out the second copy in a small fraction of cells. So, although most cancer-predisposing mutations are physiologically recessive, they are inherited as dominant alleles (Marsh and Zori 2002). So far, only three genes (RET , MET , and CDK4) have been found with inherited variants that act dominantly within cells (as oncogenes) among 31 cancer genes with single locus predisposing variants (Marsh and Zori 2002).

Pleiotropic overdominance may occur, in which a heterozygote locus that predisposes to cancer has beneficial effects on some other phenotype. Probably some cases of pleiotropic overdominance will eventually be discovered, but no evidence presently suggests this process as a major force maintaining genetic variability in predisposition.

Epistasis arises when the effect of a variant depends on the presence or absence of variants at other loci. Epistasis is much like overdominance: both processes cause changes in the phenotypic consequences of a variant in relation to the genetic background in which the variant lives. One can think of copies of the variant as living in genetically variable environments, favored in some environments and disfavored in others.


External environments also vary. For example, a variant may be disfavored in certain carcinogenic environments and favored in the absence of those environments. The variable selection can maintain variants that predispose to cancer at frequencies higher than expected through the deleterious effects of increased cancer incidence.


When thinking about cancer, we can often take a simple point of view: mutation creates deleterious variants that predispose to cancer, and selection removes those deleterious variants from the population. The other evolutionary forces listed above may or may not act in any particular case, but deleterious mutation and the purging of those mutations by natural selection occur continually. The balance between mutation and selection sets the default against which we should compare observed frequencies.

Mutation-Selection Balance: A Comparative Prediction

It is often difficult to measure precisely the rate of mutation and the rate at which natural selection purges deleterious mutations. In addition, other forces such as drift and pleiotropy often affect the frequency of deleterious, predisposing variants. So any attempt to predict precisely the frequency of a deleterious variant or to fit some model with estimated parameters of mutation and selection would mislead: one can calculate precise predictions or estimate parameters, but those calculations or estimations would only provide a false sense of precision.

We can estimate the relative strengths of mutation and selection within an order of magnitude or so. Those rough estimates provide guidelines to the expected frequencies of deleterious variants. We can also make two simple comparative predictions. First, as selection against variants increases, the observed frequency of the variants declines. Second, as mutation rate at a particular locus increases, the observed frequency of deleterious variants at that locus increases.

These rough guidelines and comparative predictions set a baseline for expectations of variant allele frequency. When observations deviate significantly from expectations, then we may turn to forces other than a balance between deleterious mutation and purging by natural selection.


Suppose a mutation is expressed in all carriers, and those carriers die before they have reproduced. In this situation, each case must arise from a new mutation, and the frequency of mutated alleles, q, is roughly equivalent to the mutation rate per generation, u, that is, q = u.

Inherited cases of retinoblastoma, Wilms' tumor, and skin cancer in xeroderma pigmentosum transmit as dominant mutations. Most individuals who carry a highly penetrant mutation develop the disease during childhood or early life. Without treatment, carriers do not usually reproduce. These diseases all occur at frequencies, q, of approximately 10−5–10−4 (Vogelstein and Kinzler 2002).

The commonly quoted values for mutation rate, u, tend to be in the range of 10−6–10−5 per gene per generation (Drake et al. 1998), an order of magnitude lower than the frequency of cases. For this type of approximate calculation, a match within an order of magnitude suggests that we have roughly the right idea about the factors that influence allele frequencies.

Certainly, other estimates of frequency for these diseases or other early-onset cancers will not match so closely to the usual estimate of the mutation rate. A mismatch implicates some force beyond the standard baseline mutation rate and immediate removal of all mutations by natural selection. For example, the penetrance may be less than perfect, some carriers may reproduce, or the gene may be unusually mutable.


Some inherited mutations have low penetrance or cause later-onset disease. Natural selection removes a mutation from the population in proportion both to the probability that it causes disease and to the reduction in reproductive success of those individuals who express the disease (Rose 1991; Nunney 1999, 2003; Frank 2004e). Reduction in reproductive success depends on the age of onset: later onset has less effect on transmission of alleles to the next generation. Figure 11.8 shows the technical details. The following paragraphs describe the main points.

Figure 11.8. The force of selection at different ages.

Figure 11.8

The force of selection at different ages. Loss in fitness caused by cancer is the force of selection averaged over the probabilities of death at different ages. This loss is , where pr, the fractional loss in fitness, is the averaged product of the age-specific (more...)

Suppose the probability of expression in a carrier—the penetrance—is p, and the reduction in reproductive success is r. If q is the frequency of the mutant allele in the population, then qp is the frequency of cases, and the rate at which mutations are removed in each generation is qpr, the frequency of cases multiplied by the reduction in reproductive success in each case. Equilibrium occurs when mutant alleles purged by selection match the influx of new mutations at rate u, so at equilibrium, qpru.

Familial adenomatous polyposis.—

Inherited mutations of the APC gene act in a dominant manner and cause the colon cancer syndrome familial adenomatous polyposis (FAP) (Kinzler and Vogelstein 2002). Nearly all carriers develop cancer, with a median age of onset of about 40 years. The frequency of cases, qp, is of the order of 10−4. We do not have historical data on the reduction in reproductive success that occurs in the absence of treatment. A reasonable value is r ≈ 10−1, which takes into account the fact that the age of reproduction in the past was probably somewhat lower than in modern societies. In this case, qpr ≈ 10−5, which is again fairly close to the standard estimate for the mutation rate.

Hereditary nonpolyposis colon cancer.—

Mutations in the DNA mismatch repair (MMR) system lead to hereditary nonpolyposis colon cancer (HN-PCC) (Boland 2002). Mutations in several MMR genes cause an increase in the somatic mutation rate, and more frequent somatic mutations lead to a high probability of early-onset cancer. The median age of diagnosis for HNPCC is about 42 years (Lynch et al. 1995). The frequency of cases is at least of the order of 10−3, but may be more because HNPCC can be difficult to distinguish from colon cancers that arise in the absence of MMR defects.

Setting the level of reproductive loss at r = 10−1, the rate of removal of MMR mutations, qpr, is 10−4 or higher. This value would indicate a high mutation rate if there were only one MMR locus. However, mutations that increase the risk of developing HNPCC have been identified in five MMR loci so far (Boland 2002), and mutations that influence HNPCC may also occur in other MMR genes. There are 22 genes in the core MMR pathway (Mohrenweiser et al. 2003). The effective mutation rate is nu, where n is the number of MMR loci and u is the mutation rate per locus. Using a range for n of approximately 3–10, we obtain a range for the mutation rate per locus of approximately 1–3 × 10−5.

Neurofibromatosis type 1.—

Inherited mutations in the neurofibromatosis 1 (NF1) gene cause a variety of symptoms with variable penetrance (Gutmann and Collins 2002). Carriers may express various nonlethal deformities: numerous flat, pigmented skin spots; freckling; pigmented nodules of the iris; and soft, fleshy peripheral tumors that arise from nerves (neurofibromas). Several other complications develop, including seizures, learning disabilities, and scoliosis.

NF1 is among the most common dominantly inherited diseases of humans. Gutmann and Collins (2002) estimated prevalence of about 3 × 10−4, based on several earlier studies (Crowe et al. 1956; Huson et al. 1989; Sergeyev 1975; Samuelsson and Axelsson 1981). Carriers almost always express some of the symptoms—a penetrance, p, of nearly one. The disease rarely reduces potential fertility, but actual reproductive success of carriers has been estimated to be about one-half of normal individuals, r ≈ 0.5 (Huson et al. 1989). Thus, qpr ≈ 10−4, which implies a high germline mutation rate.

Few families transmit a mutation through several generations, and most cases arise from new mutations (Gutmann and Collins 2002). A wide variety of DNA lesions occur in the gene, including translocations, large chromosomal deletions, smaller deletions within the gene, small rearrangements within the gene, and point mutations. No particular mutational hotspots have been detected. This large gene spans almost 9 kb of coding DNA over at least 57 exons and, including the intron regions, approximately 300 kb of total DNA. Perhaps the large size contributes to the high rate at which loss of function mutations arise. It will be interesting to learn if other special attributes of this gene cause the apparently elevated mutation rate.

Hereditary breast cancer.—

Mutations in BRCA1, which has an important function in the repair of double-strand DNA breaks, confer a high probability of developing breast or ovarian cancer (Couch and Weber 2002). Current estimates for the penetrance of breast cancer in carriers of BRCA1 mutations range from 56% to 86% (Couch and Weber 2002). Lack of functional BRCA1 leads to chromosomal abnormalities (Welcsh and King 2001), a common feature of cancer cells. The median age of onset is approximately 50 years (Ford et al. 1998), which is later than for most of the other cancers that follow dominant Mendelian inheritance. The frequency of BRCA1 mutant alleles and associated cases varies in different populations over the range 10−3–10−2 (Tonin et al. 1995; Couch and Weber 1996; Struewing et al. 1997; Couch and Weber 2002). No data measure the decrease in reproduction in carriers of BRCA1 mutations: a reasonable guess would be in the range 10−2–10−1. These values give an estimate for qpr of 10−5–10−3, which is somewhat higher than the standard assumption of 10−6–10−5 for the mutation rate.

Welcsh and King (2001) suggested that BRCA1 may have an elevated somatic mutation rate because of the high density of repetitive DNA elements in the gene. Those repeats may also cause a higher germline mutation rate, which would explain the higher than expected frequency of variants in populations.

Alternatively, Harpending and Cochran (2006) argued that natural selection of BRCA1 variants may be more strongly affected by that gene's role in early brain growth and development rather than in DNA repair. Such pleiotropy could explain the elevated frequency of BRCA1 if the variants had beneficial effects on brain development. In particular, Harpending and Cochran (2006) argue that heterozygotes for BRCA1 variants can in some environments have beneficial neural effects, but the variant homozygotes would be at a disadvantage. A mild heterozygote advantage balanced against strongly deleterious effects in the variant homozygotes could explain the observed frequency of BRCA1 variants. The age of variant BRCA1 alleles may provide clues about the forces that affect allele frequencies.

The Age of Alleles: A Comparative Prediction

Variants that cause greater reproductive loss will disappear from the population faster than variants that cause relatively lower reproductive loss.

In the simplest case, each new variant causes early death before reproduction, and each variant only lives for a generation. Lower penetrance or later onset imposes a weaker selective sieve against variants, allowing the variants a longer time before extinction.

Soon, we will have enough data on the DNA sequences of variants to allow reconstruction of their history and the time back to their common ancestor—the age of the allele. If the age of alleles is primarily determined by a balance between the origin of novel variants by mutation and clearance from the population by selection, then those ages should follow the simple prediction that more deleterious alleles tend to last a shorter period of time. Alternatively, forces other than mutation-selection balance may determine the age of alleles.

Consider, for example, the two alternative hypotheses for BRCA1 variant frequency. If the elevated frequency of BRCA1 variants arises from a higher germline mutation rate for that gene balanced against continual loss of variants by selection, then most variants at this locus should be relatively young (recent in origin). By contrast, if the elevated frequency arises from pleiotropic beneficial effects on neural development balanced against deleterious effects on cancer progression, then most variants at this locus should be relatively old.

11.3 Few Common or Many Rare Variants?

I have discussed a small number of mutations in which carriers suffer significantly earlier onset of disease. In those cases, a single mutation greatly increases incidence. Such mutations often appear to occur in key genes that directly affect progression of the particular type of cancer.

The search for single mutations of large effect has intensified over the past few years. However, few new mutations have been discovered. Most of the inherited predisposition to cancer remains unexplained. The widespread heritability of cancer appears to be caused by several variants each of relatively small effect—what is often called polygenic inheritance.

Within this large, polygenic component of heritability, do genetic variants that cause disease tend to be common or rare? Are there relatively few common, older variants or many rare, newer variants?

Much recent debate in biomedical genetics has turned on these questions, because methods for estimating genetic risk in particular individuals depend on the frequency of variant alleles (Weiss and Terwilliger 2000; Lee 2002). If most genetic risk comes from a few relatively common alleles that are relatively old, then those alleles will be associated with other polymorphisms in the genome that can be used as markers of risk. Those associations arise because the original mutations will, by chance, occur in regions in which other single nucleotide polymorphisms (SNPs) are located nearby.

By contrast, most genetic risk might come from many rare, young alleles. If so, then there will be no consistent association between known SNPs and genetic predisposition. Each particular mutation will have its own profile of linked marker polymorphisms, often specific for a particular population. Those linkage profiles will differ for each mutation. Because there may be many mutations, with each making only a small contribution to genetic risk, no overall association will occur between known marker polymorphisms and total genetic risk.

The available data do not definitively distinguish between a few common, older variants and many rare, younger variants. Wright et al. (2003) argued eloquently in favor of many rare variants; I agree with their logic. However, the issue here does not turn on point of view, but rather on the actual distribution of variants and their effects. I discuss two examples that provide the first clues.

Multiple Colon Adenomas

Fearnhead et al. (2004) collated data on 124 individuals with multiple adenomatous polyps. They screened those individuals for germline DNA variants in five genes known to influence colon cancer progression, and found 13 different variants. They compared the frequency of those 13 variants in the 124 cases with the frequency in 483 random control individuals.

Table 11.1 shows the frequencies of the 13 variants in cases and controls. These results suggest that many rare variants, each of small effect, contribute significantly to the heritability of cancer. In this study, almost all of the variants were single amino acid substitutions. Each such small change in protein shape and charge may contribute a small amount to disease. Many such changes, each rare, may in the aggregate explain much of the genetic basis of disease.

Table 11.1. Variants in cases with multiple polyps and in controls.

Table 11.1

Variants in cases with multiple polyps and in controls.

Fearnhead et al. (2004) support their argument that single amino acid substitutions in proteins contribute to disease by evaluating the functional changes for many of the mutations listed in Table 11.1. Almost all of the variants occur in regions of their proteins known to have important functional roles in pathways that are often disrupted in tumors. I briefly summarize two examples from Fearnhead et al.'s (2004) discussion.

The APC variant E1317Q alters charge in the region that binds to β-catenin. Mutation of the APC regulatory pathway appears to be a common first step in adenoma formation (Kinzler and Vogelstein 2002). APC represses β-catenin, which may have two different consequences for cellular growth. First, β-catenin may enhance expression of c-Myc and other proteins that promote cellular division. Second, β-catenin may play a role in cell adhesion processes, effectively increasing the stickiness of surface epithelial cells. In either case, repression of β-catenin reduces the tendency for abnormal tissue expansion. In tumors, somatic mutations in APC usually include domains involved in binding β-catenin, releasing β-catenin from the suppressive effects of APC (Kinzler and Vogelstein 2002).

The hMLH1 variant K618A alters the charge of a highly conserved region of this DNA mismatch repair protein. Several deleterious mutations have been reported in this region (Wijnen et al. 1996; Peltomaki and Vasen 1997; Mitchell et al. 2002), and studies in yeast demonstrated that substitutions at position 618 cause functional changes (Shimodaira et al. 1998). hMLH1 works in various heteromeric complexes, including interaction with hPMS2 (Buermeyer et al. 1999; Fishel 2001); the hMLH1 K618A mutation causes more than 85% loss of interaction between hMLH1 and hPMS2 (Guerrette et al. 1999).

DNA Repair Variants

Earlier in this chapter, I mentioned that DNA repair efficiency varies considerably in populations and has a large heritable component (Grossman et al. 1999; Cloos et al. 1999; Roberts et al. 1999). In addition, poor repair efficiency consistently associates with an approximately 2–10-fold increase in cancer risk (Berwick and Vineis 2000).

The previous section showed that rare variants at DNA mismatch repair loci can predispose to colon cancer. The fact that rare variants can predispose does not resolve whether the high heritability of repair efficiency and cancer predisposition arises mainly from relatively rare or common alleles. The existing data do not settle the issue. Two lines of evidence provide clues.


Mohrenweiser et al. (2003) summarized genetic variation across 74 DNA repair loci. Figure 11.9 shows that the rare, intermediate, and common alleles contribute equally to the variance in allele frequency. To understand what this means, consider how to calculate the genetic variance in allele frequencies.

Figure 11.9. The relative variance in allele frequencies for rare and common alleles of 74 DNA repair genes.

Figure 11.9

The relative variance in allele frequencies for rare and common alleles of 74 DNA repair genes. The total number of variants in each frequency category is shown above the bars. Each rare variant contributes a small fraction of the total variance, but (more...)

The contribution of a variant allele with frequency pi to the variance at its locus is vi = pi(1 − pi). A rare allele at frequency pi = 0.01 contributes vi ≈ 0.01 to the frequency variance. A common allele at frequency pi = 0.11 contributes vi ≈ 0.1 to the frequency variance, or about an order of magnitude more than the rare variant. If there were ten times as many rare variants as common variants, then the rare and common variants would contribute equally to the total variance.

Figure 11.9 shows that there are more rare variants than common variants. The excess of rare variants explains why the total contribution to the variance in allele frequency is about the same for rare, intermediate, and common alleles.

These calculations provide information about the frequency of variant alleles. However, these data do not connect the different variants to their consequences for disease. Inevitably, some of the variants will have little or no effect, whereas others may significantly increase risk. The common types are unlikely to be severely deleterious, but beyond that, no strong conclusions can be made about the effects of the variant alleles. The data on colon cancer in the previous section show that rare variants can influence predisposition. The next section shows that combinations of common variants may also significantly affect predisposition.


A pathway such as a particular type of DNA repair forms a quantitative trait that protects against cancer progression. Certain individual polymorphisms may each reduce the efficacy of the pathway by a small amount, and consequently cause a small and perhaps undetectable increase in cancer risk. In combination, multiple polymorphisms may significantly reduce efficacy and consequently cause a significant rise in cancer risk. Particularly high risk may occur when those polymorphisms concentrate in one or more key pathways and compromise essential protective mechanisms (Han et al. 2004; Popanda et al. 2004; Cheng et al. 2005; Gu et al. 2005; Wu et al. 2006).

Wu et al. (2006) measured the frequency of 44 polymorphisms in variant DNA repair and cell-cycle control genes. They compared frequencies in 696 patients with bladder cancer versus 629 unaffected controls. The study focused on the increase in relative risk with a rise in the number of variant alleles. The hypothesis was that many cases would arise in individuals who carry a greater than average number of predisposing polymorphisms in key pathways.

To analyze the role of multiple variants in a sample of modest size, one must study relatively common variants. If the variants were rare, very few individuals would carry several variants. Thus, the design of Wu et al.'s (2006) study focuses attention on the role of multiple common variants, without addressing how multiple rare variants may contribute to disease. In spite of this limitation, the study is important because much of polygenic predisposition may arise from the combined effect of many variants. Given the widespread distribution of variant alleles in populations (Figure 11.9), each individual carries a unique combination of numerous variants across key pathways in carcinogenesis.

Wu et al.'s (2006) most interesting result concerns the interaction between smoking and polymorphisms in the DNA repair pathway that functions in nucleotide-excision repair (NER). The NER pathway removes bulky DNA adducts frequently caused by the polycyclic aromatic hydrocarbons in tobacco smoke. Smoking significantly increases bladder cancer risk. A few studies have shown that certain single polymorphisms within the NER pathway associate weakly with greater susceptibility to bladder cancer (reviewed by Garcia-Closas et al. 2006). Such weak effects are often difficult to reproduce in subsequent studies.

Wu et al. (2006) included 13 NER variants across nine loci. Among those who have smoked, individuals with seven or more NER variants had a relative risk of cancer 3.37 times greater than those with fewer than four variants, with a 95% confidence interval for relative risk of 2.08–5.48. Among nonsmokers, individuals with seven or more variants had a relative risk of cancer 1.40 times greater than those with fewer than four variants, with a 95% confidence interval for relative risk of 0.72–2.73.

Wu et al. (2006) further analyzed all 44 polymorphisms across 33 DNA repair and cell-cycle control loci. Among the 851 individuals who had smoked, 74% of the subjects had bladder cancer. The most powerful genetic effect concentrated in the NER loci: among the 124 smokers who carried three particular NER variants, 97% had bladder cancer, whereas only 53% of those smokers who did not carry all three variants had bladder cancer.

The results in Wu et al.'s (2006) study suggest that multiple NER variants significantly raise cancer risk in smokers. Such studies are often difficult replicate for at least three reasons.

First, the strong effect of smoking demonstrates that certain polymorphisms may only have strong effects in the presence of particular environmental challenges. Unmeasured environmental or genetic effects may often determine whether the particular genotypes under study play an important role in progression.

Second, the variants under study may not directly affect progression, but instead be linked to variants at other sites that influence carcinogenesis. In other populations, with different genetic linkage relations, those same variants will associate differently with cancer rates.

Third, such studies suffer from problems common to exploratory statistical analyses: the number of variables (polymorphisms) and their combinations greatly exceeds the number of individuals sampled. With so many different combinations, by chance certain combinations will associate with strong differences in outcome. Although statistical methods attempt to deal with such problems, conclusions from such studies often do not hold up in future attempts to repeat the work.

With those caveats in mind, I now compare Wu et al.'s (2006) results with a similar study. Garcia-Closas et al. (2006) analyzed 22 polymorphisms in seven NER genes among 1,150 bladder cancer cases and 1,149 controls. In agreement with Wu et al. (2006), Garcia-Closas et al. (2006) found weak effects for each variant when analyzed in isolation, but found stronger, significant effects when analyzing the interaction between smoking and multiple NER variant sites. Garcia-Closas et al. (2006) limited their analysis to pairs of variant NER sites, and found that certain pairs of variants significantly increased risk in smokers.

The two studies had six NER polymorphisms in common. Four of those polymorphisms were not particularly important in either study. At the locus RAD23, one particular variant played a key role in Wu et al. (2006) but, although present in Garcia-Closas et al. (2006), did not play a key role in that study. Instead, Garcia-Closas et al. (2006) found that a different variant site in RAD23 had significant explanatory power when evaluating interactions between pairs of variant sites. The two studies also shared a variant at the ERCC6 locus: that variant was important in multisite interactions in Wu et al. (2006) but not in Garcia-Closas et al. (2006).


Preliminary evidence suggests that risk depends on the combination of effects at multiple variant sites. Practical sampling issues limit studies to combinations of common variants. In small samples, combinations of rare variants occur too infrequently to allow study. In the population, more rare variants occur than common variants (Figure 11.9), so the net contribution of multiple rare variants may be at least as great as the combinations of common variants.

The effect per variant of rare versus common variants remains unknown. Rare alleles will likely have greater effects than the common alleles if variant frequency depends on mutation, drift, and selection against deleterious effects. By contrast, common alleles may have larger effects if variants either have variable consequences depending on environment or genetic background, or if variants have beneficial pleiotropic effects that offset the deleterious traits that increase cancer incidence.

It will not be easy to work out the relative contribution of different variants and how variants combine to determine disease. But much attention will continue to focus on this problem. Through cancer studies, we will gain insight into the genetic basis of variability in key functional components, such as DNA repair and tissue regulation via control of the cell cycle. By study of functional components and their genetic basis of variation in efficiency, and how the components interact to determine disease, we will begin to understand how evolution has shaped the age-specific curves of failure. Through those curves of failure, we can analyze the evolutionary design of reliability that sets the nature of disease and aging.

11.4 Summary

The first part of this chapter described how inherited genetic variants affect the age of cancer onset. In the future, new genomic technologies will measure genetic variation with far greater resolution. To interpret those high-resolution measurements of genetic variation, we will have to connect the observed genetic variation to the causes of cancer. Such connections can only be made by studying how genetic variants shift the age-specific incidence. In the second part of the chapter, I analyzed the population frequency of predisposing genetic variants in light of various evolutionary forces. I suggested that studies of cancer predisposition may lead the way in understanding the structure of inherited genetic variation for age-specific diseases.

The next chapter turns to the somatic evolution of cancer within individuals. Most human cancers arise in tissues that renew throughout life. Those tissues often derive from stem cells. I review the biology of stem cells and how the shape of stem cell lineages in renewing tissues affects the progression of cancer.

Copyright © 2007, Steven A Frank.

This book, except where otherwise noted, is licensed under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Bookshelf ID: NBK1549
PubReader format: click here to try


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...