U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Committee on the Analysis of Cancer Risks in Populations near Nuclear Facilities-Phase I; Nuclear and Radiation Studies Board; Division on Earth and Life Studies; National Research Council. Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase I. Washington (DC): National Academies Press (US); 2012 Mar 29.

Cover of Analysis of Cancer Risks in Populations Near Nuclear Facilities

Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase I.

Show details

4Epidemiologic Studies

This chapter addresses the second charge in the statement of task for this study (see Sidebar 1.1 in Chapter 1) on methodological approaches for assessing cancer risks in populations near U.S. Nuclear Regulatory Commission (USNRC)-licensed nuclear facilities. It is specifically intended to address the following issues:

  • Different epidemiological study designs and statistical assessment methods.
  • Geographic areas to use in the study.
  • Cancer types and health outcomes of morbidity and mortality.
  • Characteristics of the study populations.
  • Availability, completeness, and quality of cancer incidence and mortality data.
  • Approaches for overcoming potential methodological limitations arising from low statistical power, random clustering, changes in population characteristics over time, and other confounding factors.
  • Approaches for characterizing and communicating uncertainties.


Epidemiology is the study of the distribution of diseases and other health-related conditions in populations, and the application of this study to control health problems. The purpose of epidemiology is to understand what risk factors are associated with a specific disease, and how disease can be prevented in groups of individuals; due to the observational nature of epidemiology, it cannot provide answers to what caused a disease to a specific individual. Epidemiologic studies can be used for many reasons, commonly to estimate the frequency of a disease and find associations suggesting potential causes of a disease. To achieve these goals, measures of disease (incidence) or death (mortality) are made within population groups. Epidemiology is fundamentally multidisciplinary and it uses knowledge from biology, sociology, statistics, and other fields.

The four types of epidemiologic studies commonly used in radiation research are cluster, ecologic, case-control, and cohort studies. An additional approach for estimating risk in radiation research—although strictly not an epidemiologic study—is risk-projection models. These models are used to predict excess cancer risks by combining population dose estimates with existing risk coefficients to transfer risks across populations with different baseline rates. This type of modeling approach is not new; one of the earliest examples of its use was by the U.S. Federal Council Report, where 0 to 2000 leukemia deaths in the United States attributed to exposures to fallout from above-ground nuclear testing up to 1961 were estimated (Federal Radiation Council, 1962). As discussed in a comprehensive review (Berrington de González et al., 2011), recent applications of the risk-projection modeling have increased partly because of the publication of user-friendly risk estimates for U.S. populations in the BEIR VII report (NRC, 2005) and the increasing acceptance of the limitations of epidemiologic studies of low-dose radiation exposures, mainly owing to their limited statistical power.

The study designs described in this chapter can provide clues for potential associations between cancer and living near a nuclear facility. The first thing that the epidemiologist questions is whether any observed association is real, or if it is due to bias, confounding, or simply due to chance. “Bias”1 is a general term related to error in the measurement of a factor and can arise from a variety of sources such as the method of selection of cases and controls, or exposed and unexposed (selection bias), or due to the inaccurate information regarding either the disease or exposure status of the study participants (information bias). On the other hand, confounding refers specifically to the existence of some third variable, the “confounder,” that alters the degree of association between the exposure and the disease of interest. Confounding is a potential issue with all epidemiologic studies discussed here.

4.1.1. Cluster Studies

A cancer cluster is an aggregation of a relatively unexpected high number of cases. Clustering can be “spatial,” when the disease in question has a higher incidence rate in some places than in others, or “temporal,” when the incidence rate is higher at a specific time compared to other times. A disease cluster can also be “spatiotemporal.” Testing involves comparing the observed number of cases with the number expected, based on the size and age composition of the population.

The scientific reason to examine disease clusters is to learn about the causes of the cluster and, by extension, gain insight toward the causes of disease. Epidemiologists and public health workers recognize the value of historic examples of cancer cluster examination which contributed to the recognition of human carcinogens in those situations. Typically, exposure was high, prolonged, and well defined. In contrast, most cluster reports involve exposures that are low and poorly defined, and the cases involved are a mix of unrelated, relatively common cancers. For these reasons there is skepticism regarding the scientific value of the investigation of reported clusters (Neutra, 1990; Rothman, 1990).

In a rather provocative summary of the reasons why—with a few exceptions—there is little scientific or public health purpose to investigate individual disease clusters, Rothman (1990) explains that the boundaries of the space and time that encompass the cluster should be clearly defined before examination of the cluster and should not be defined after the fact to capture a population that has experienced the high disease rate. This interpretation has been described as the “Texas sharpshooter’s” procedure in which the shooter first fires his shots randomly at the side of the barn and then draws a bull’s eye around each of the bullet holes. This kind of process tends to produce clusters of causally unrelated cases of no etiologic interest. As noted by Rothman (1990), assigning statistical significance to a reported cluster requires clear definitions of the populations, regions, and/ or time periods under consideration, often a challenging undertaking.

4.1.2. Ecologic Studies

An ecologic study (sometimes referred to as a geographic study or correlation study) evaluates the relationship between an exposure and a disease in some aggregate group of individuals, but not specific individuals, such as those living in a country, a county, a community, or a neighborhood. This is in contrast to case-control and cohort studies where the unit of analysis is the individual. In an ecologic study, average measures of exposure and disease frequency are obtained for each aggregate, and the analyses focus on determining whether or not the aggregates with high levels of exposure also display high disease rates. For example, in a study that uses counties as the unit of analysis, the data of interest are average values of exposure and aggregate counts of disease by county. However, the individuals who actually develop cancer in a county may be more or less exposed than the county average, so the association across county populations may not accurately reflect the association for the individuals who develop cancer. This issue is referred to as ecologic fallacy or ecologic bias and is the main limitation associated with ecologic studies. The magnitude of the ecologic bias is not measurable; therefore, conclusions need to be stated carefully and results interpreted with caution.

One of the causes of ecologic fallacy is that average levels of potential confounding variables across the geographic units may be subject to considerable measurement error, so trying to adjust for the geographically estimated confounding variables fails to control for confounding. This was illustrated in a study of the association of average county radon levels with lung cancer rates, with an attempt to characterize smoking levels by county (Cohen, 1995, 1997). The radon–lung cancer ecologic correlations were in the negative direction, whereas a series of studies using estimated individuals’ radon exposure have shown positive associations (Darby et al., 2005). This poor control for confounding is important mainly for potential variables that have strong association with the target disease (e.g., smoking and lung cancer) and is of lesser concern for weak confounding variables. However, when expected effects of exposure are themselves quite weak, then good control for confounding variables becomes especially important.

4.1.3. Case-Control Studies

The aim of a case-control study is to determine whether the frequency of exposure to several possible risk factors is higher in the group of people with the disease of interest (cases) than in the group without the disease (controls). The proportion of cases with and without an exposure suspected to be linked with the disease is compared to the proportion of controls with and without the relevant exposure. If a certain exposure is associated with or causes a disease, then a higher proportion of past exposure among cases is expected compared to the proportion of past exposure among the controls. If the difference cannot be explained by chance, an association between the disease and the characteristic may be inferred.

Cases can be selected from hospitals, registries, or other relevant sources. However, cases based on hospitals may be a biased sample; for example, those cases seen at referral hospitals may represent more serious or unusual cases. Therefore, population-based case ascertainment is the preferred study design. This may be possible through a cancer registry if the registry can provide complete information on diagnoses of cases. Control selection requires equal thought and consideration, because the controls must come from the same population base as the cases; subtle differences in the way cases and controls are selected may lead to selection bias. The major point is that the controls have to reflect the population from which the cases arose. For general-population case-control studies, various methods are used to identify controls for study as discussed in Section 4.3.4.

4.1.4. Cohort Studies

In a cohort study, the investigator typically selects a group of exposed and a group of unexposed individuals and follows both groups over time to determine disease occurrence in relation to the exposure. In the radiation epidemiology field, when individual exposures or doses are available, cohort studies typically examine gradients of exposure rather than just un-exposed and exposed groups. The data necessary for assessing disease diagnosis can be obtained either directly by periodic examinations of individuals or by obtaining data from disease registrations, hospital records, and death certificates. For rare diseases or those that take a long time to become evident, such as cancer, the investigator needs to start with a large number of exposed and unexposed individuals and follow them for a long period of time. Study participants may be lost to follow up in a cohort study because they do not wish to take part in the study, because they cannot be located, or because they have died. Minimizing these losses is crucial because they reduce the number of participants being followed. Also, participants that are lost to follow-up may differ in characteristics from those that remain enrolled in the study. When reporting the study design, it is important to note the percentage of and any available demographic information on subjects that are lost.

A cohort study is considered to be a more scientifically rigorous study design compared to case-control, ecologic, or cluster studies. This is because cohort studies measure potential exposures before the disease has occurred and therefore can demonstrate that they may have caused the disease. Because cohort studies most often look forward to the future, they are also referred to as prospective studies. However, a cohort study can also be retrospective if both exposures and outcomes have already occurred and accurate historical data are available when the study begins. Studies on radiation effects are often jointly retrospective and prospective; exposures occurred mainly in the past and disease ascertainment includes both past and prospective follow-up.


Choosing from among different possible study designs to assess cancer risks in populations near nuclear facilities, or even deciding against making a proposal for a particular study design, is based on answers to several difficult questions. Most of these questions are scientific, dosimetric, epidemiologic, and statistical, and require technical knowledge and expertise. However, some are less technical and involve public concerns and perceptions that may be difficult to quantify. The primary focus of this chapter is on technical issues, partly because they serve as a foundation for judgments that may involve additional public and stakeholder considerations.

The committee considered the following general approaches to an epidemiologic study of cancers that might be undertaken by the USNRC:


Risk-projection models.


An ecologic study based on estimates of exposure levels at the census-tract level.


Cohort studies tracking estimates of individual exposure levels and recording case incidence within the cohort. Variations considered include:

  • A prospective cohort study.
  • A retrospective cohort study.

Case-control studies comparing estimates of individual exposure levels between cancer cases and controls. Variations considered include:

  • A record-linkage-based case-control study with no direct contact with cases and controls or their proxies.
  • A de novo case-control study with direct contact with cases and controls or their proxies.
  • Building on existing studies and their associated data.

The discussions of these possible studies in the following sections are based primarily on the study characteristics summarized in Table 4.1. Section 4.2.1 of this chapter considers matters that affect most or all of these study designs; Section 4.2.2 describes each approach in some detail. These descriptions define the strengths and weaknesses of the recommended studies, summarized in Section 4.2.3. Section 4.3 provides a summary of data sources for population counts, health outcomes, and other information required for the execution of the studies considered and recommended.

TABLE 4.1. Summary of the Characteristics of the Studies Considered.


Summary of the Characteristics of the Studies Considered.

4.2.1. Issues Affecting Several Epidemiologic Study Designs

In any of the studies considered, population sizes, estimated doses, and resulting risk estimates may be too low to demonstrate statistically significant increased cancer risks near nuclear facilities. As noted in Chapter 3, the dose received from living near a nuclear plant is estimated to be less than 0.01 mSv/yr (USEPA, 2007). This dose is much lower than doses from natural background radiation and medical diagnostic procedures, which combined are estimated to be 6.2 mSv/yr for the average2 person in the United States (NCRP, 2009). Consequently, the attributed risk to exposure from radiation from a nuclear facility, if any, would be a small increase above the baseline lifetime risk of cancer occurrence in the general population in the United States, which is considered to be 42 percent (NRC, 2005).

Statistical power calculations based on estimated exposure estimates indicate that extremely large sample sizes are required except under the following scenarios:


Routine releases from the operating facilities have been far greater than those reported to the USNRC, or


Sensitivity to radiation as characterized in most or all generally accepted risk models is either inappropriately low or simply irrelevant to the populations living near nuclear facilities in the United States.

Regarding scenario B, underestimation of risks associated with radiation could be perhaps a result of inaccurate models for interpolation to low doses. Translation of risk estimates from World War II atomic bombing survivors to the population in the United States may also be proven inaccurate, though there is reasonably good concordance of estimated risks for Japanese and Western populations (UNSCEAR, 2006, Annex A). Exceptions are a few cancer sites with disparate background rates, such as stomach and liver cancer. (These cancers are more common among the Japanese compared to Western populations due to differences in risk factors such as diet and rate of infections.)

Even if one or both of these scenarios are considered possible, the reliability of any proposed study still hinges on the technical issues of accurately characterizing doses received by the populations under study over the time of facility operations. Accurate estimation of those doses requires reasonably accurate measures of releases, modeling of exposure levels at various geographic locations, and biologic uptake and biokinetics for ra-dionuclide exposures (see Chapters 2 and 3). Questions Addressed by the Studies

Epidemiologic studies provide the most direct and relevant evidence for an association between a suspected risk factor and disease. Each of the study approaches considered in this chapter might produce useful new information regarding the association between living near a nuclear facility and potential cancer risks. However, they are unlikely to contribute substantial scientific knowledge regarding low-dose radiation effects because exposure levels are uncertain and probably low, which produces risk estimates with large relative uncertainties. Moreover, each of the possible study approaches is subject to limitations in the types of questions that may be answered. The committee has framed three questions of primary interest based on its statement of task (see Sidebar 1.1 in Chapter 1):


Are any detectable cancer-related health effects, such as mortality and morbidity from any type of cancer, associated with living near a nuclear facility at present or in the past?


If so, what are the characteristics of the affected persons (such as age, gender, race/ethnicity)?


What are the factors that could (and should) be examined to help detect and adjust for possible confounding (such as smoking and exposure to medical diagnostic procedures)?

These questions are closely related, and cannot be fully investigated as if they were independent of each other. The second and third questions are of little interest if there is no health effect to be studied. Furthermore, the difficulties in deriving an unambiguous answer are so great that it seems unlikely that the other questions, as important as they are, can ever be answered with precision by epidemiologic studies of populations living near nuclear facilities. As a result, the committee focused most of its effort on evaluating approaches to address aspects of this first question. If an association between living near a nuclear facility and cancer risk is observed, a balanced “weight-of-evidence” approach needs to be applied to determine whether the association is real, and whether that association can be explained by the radioactive releases from nuclear facilities.

A plausible cause-effect relationship between radioactive releases from nuclear facilities and cancer cannot be established solely by examining risks in populations living near nuclear facilities through any of the study designs considered. Direct epidemiologic investigation of the exposures in populations near nuclear facilities is limited by small numbers, the presence of unmeasured risk factors and potential confounders, and/or uncertainty in the exposure estimation. For these reasons, understanding the carcinogenic effects of low-level radiation exposure requires a diverse body of evidence in addition to any epidemiologic findings. Such evidence includes the effects of radiation on cell culture systems and animal models where all conditions including dose and dose rate are easily controlled and measured and therefore causal associations with disease outcome can be established. This is the focus of the Department of Energy’s Low Dose Radiation Program.3 Study Endpoints: Cancer Incidence and Mortality

Fundamental to the assessment of cancer risks are the concepts of mortality and incidence rates, that is, numbers of cancer deaths or new cancer occurrences observed or expected per year in a population of a specified size (often presented per 100,000 persons in a population or per 100,000 persons of each gender in a population).

Incidence is a measure of disease burden, as it describes the occurrence of new cancer cases. Mortality can index a more severe form of disease burden provided that survival is the same in the groups being compared, as mortality reflects both incidence and survival probability. However, for cancers that are not commonly fatal, for example, thyroid cancer, the most useful end point of disease burden is incidence of the disease since in any given year mortality will represent both new and existing cases of disease. A mortality study of thyroid cancer would have restricted statistical power in testing increases in risk at a certain time and interpretation because most of the incident cases in a year would not be captured in the mortality statistics for that year, and many of the deaths in the mortality data for a given year would have been diagnosed many years earlier.

On the contrary, for highly fatal cancers such as lung and pancreatic cancers, mortality data would reflect cancer incidence quite accurately. For diseases that have a greater susceptibility to surveillance bias such as prostate cancer, mortality data may be useful because they are minimally affected by that bias.

In an ideal study, one would identify each newly diagnosed case of some cancer type in the population under study at or near the time it was diagnosed. This may be possible in states where cancer registries have been in place for the study period of interest and the data are complete and of good quality (see Section 4.3.2). However, many cancer registries were established after nuclear facilities began operations. The time-limited availability of some registry data would affect mortality studies that use aggregated data at small geographic units such as a census tract; however, it would not affect mortality studies that use aggregated data by county. County-level mortality data have been commonly used in the United States because of the ease of identifying cases nationwide over a long time period through the National Center for Health Statistics (NCHS) (see Section 4.3.3).

Misdiagnosis of cancer is currently less of a concern than it used to be for both incidence and mortality studies; however, misclassification4 of cancer types may occur. Moreover, incidence studies may lead to biased results when there are changes over time in the likelihood that a cancer was diagnosed, that it was diagnosed but not reported, or that the diagnostic criteria changed. The likelihood that a life-threatening cancer will not be diagnosed is small, but the prevalence of asymptomatic, undiagnosed cancers, especially in older persons, can be large. Changes in the intensity with which people are screened and cancers are reported and registered (for example, prostate cancer) can produce an appreciable artifactual trend in recorded incidence. Also, the reported site of a cancer may be incorrect, especially in earlier years. An example is the earlier misdiagnosis of metastatic cancers as primary in the brain, whereas newer imaging technologies continue to improve the classification of cancer to the correct primary site.

The detection of small, more indolent cancers and the appreciable variation within and between populations in the use of diagnostic tools can affect incidence data but may have little effect on mortality data. Variations in degree of cancer surveillance can be a concern for some cancers; uneven degrees of surveillance in populations in various geographic locales can artificially simulate or mask exposure-response relationships. The primary site of a cancer is more likely to be recorded accurately by a cancer registry than a death certificate (German et al., 2011). Also, trends in registration rates should not be biased by improvements of cancer therapy on patient survival. This problem is avoided by using data on deaths from registries with active follow-up of patients such as that implemented by the Surveillance, Epidemiology, and End Results (SEER) registries (see Section 4.3.2), although such studies would be limited to the states or regions covered by these registries and would not cover all areas near nuclear facilities.

For the reasons mentioned above, incidence and mortality studies provide complementary data, and both could provide potentially useful information. When the quality of the incidence and mortality data is high, the mortality-to-incidence ratio is related to case survival; when the quality of one or the other is not adequate, the ratio will deviate from the survival ratio. The value of either incidence or mortality registries increases when data from different times and locations can be compared because they are compiled according to agreed national or international standards. All cancer registries in the United States use classification schemes that are largely compatible with each other and with the classification for causes of death on death certificates.

Both risk of developing cancer and risk of dying of cancer are substantial public concerns. In an analysis of cancer risks near nuclear facilities, incidence and/or mortality data are linked with residence at the time of cancer diagnosis or death from cancer that is retrieved from medical records or death certificates, respectively. As cancers manifest themselves years or decades after the exposure (see discussion on latency period in a later paragraph of this section), for such inferences use of incidence data is somewhat preferable to mortality because residence at time of diagnosis is a better indicator of where the person may have lived at time of exposure compared to residence at time of death. Persons who lived in a particular area at time of death may not have been long-term residents of that area and, therefore, may not reflect the address at which the relevant exposure occurred, possibly many years earlier. Selection of Cancers to Study

Radiation can cause cancer in almost any tissue in the body but some sites are more susceptible to radiogenic effects than others (UNSCEAR, 2006, Annex A). In general, it has been found that cell radiosensitivity is roughly proportional to the rate of cell division, so cells that actively divide are more radiosensitive (although there are exceptions to this).

Radiation-induced cancers, similar to cancers induced from other risk factors, manifest themselves years or decades after the exposure. The lag time between exposure to a disease-causing agent such as ionizing radiation and the clinical recognition of the disease is known as the latency period. The mean latency period per cancer type due to radiation has not been comprehensively summarized, partly because it varies by age at exposure to radiation (Preston et al., 2002; Ron et al., 1995), type of cancer, and especially duration of follow-up of the cohort. However, studies of the atomic bomb survivors in Japan have demonstrated that for most major cancers the latencies of individual cancer cases begin at some minimum period and extend for the rest of the lifetime. Epidemiologic studies that aim to link exposure to radiation and cancer often use a 2-year minimum latency period for leukemia and a 10-year minimum latency period for solid5 cancers (Boice et al., 2011). For this reason, past exposures are more relevant than current exposures as potential causes of cancer.

Given that different segments of the public have concerns about a variety of cancers, study of a wide range of cancers may be necessary, but particular attention needs to be given to the most radiosensitive cancer sites, including leukemia, female breast, bladder, thyroid, brain, and ovary.6 Childhood leukemia is a “sentinel” cancer for radiation exposure and may merit separate, more detailed study with individual exposure information, as will be discussed in Section 4.2.2. Examining cancers that are presumably nonradiogenic in origin such as prostate cancer could serve as useful negative controls.

Much of what we know about tissue radiosensitivity comes from studies of the Japanese atomic bombing survivors, who generally received radiation exposure to the whole body. In that population, statistically significant excess risks have been shown for leukemia, non-Hodgkin lymphoma (males only), total solid cancer, and cancers of the oral cavity, esophagus, stomach, colon, liver, lung, skin (nonmelanoma), female breast, ovary, bladder, brain, and thyroid. These results are broadly confirmed by other studies (UNSCEAR, 2006, Annex A). For most other sites data suggest possible positive associations; however, a larger number of cases is needed to reach firm conclusions. The highest relative risks (RR; shown as the estimated RR at a 1 Sv dose at age 70 after exposure at age 30) in the atomic bombing survivors study were: leukemia (RR = 5.3), urinary bladder (RR = 2.2), female breast (RR = 1.87), lung (RR = 1.81), brain and central nervous system (RR = 1.62), ovary (RR = 1.61), thyroid (RR = 1.57), and colon (RR = 1.54) (Preston et al., 2007). For comparison, the risk estimate for total solid cancers was RR = 1.47 (90% confidence interval [CI]: 1.40, 1.54).

Two sites were notable for the fact that relative risk after exposure in childhood was much larger than that associated with exposure at age 30, namely, thyroid cancer (exposure at age 10 and age 30, RRs = 2.21: 1.57), and nonmelanoma skin cancer at high doses (greater than 1 Gy) (RRs = 3.28: 1.17) (Preston et al., 2007). Leukemia also showed a higher risk for those exposed in childhood, although the exact excess risk is difficult to estimate because of the complex temporal patterns of risk (Richardson et al., 2009) demonstrated in Figure 4.1. More specifically, excess risk for leukemia varies from >50-fold 5-10 years after exposure, to only roughly twofold by 30 years after exposure; therefore, an average estimate would not correspond to the estimate in various time periods.

FIGURE 4.1. Predicted excess relative risk (ERR) (see Appendix A, Sidebar A.


Predicted excess relative risk (ERR) (see Appendix A, Sidebar A.1 for definition) at 1 Gy for leukemia (all types) as a function of age at exposure and time since exposure. SOURCE: Richardson et al. (2009).

An epidemiologic investigation of cancer risks due to radiation exposure is complicated by the lack of diagnostic tests, clinical or molecular, that can determine the cause of cancer in an individual. For this reason, it is important to collect, where possible, information on other risk factors linked with the cancer type in question so that investigators can exclude other possible reasons for the disease to have occurred. For some cancers, established risk factors can explain the majority of the observed cases. This is true for lung cancer as smoking causes 90 percent of the lung cancer cases. Given the strong smoking effect, analyzing lung cancer data in relation to low-dose radiation exposure would be fraught with potential problems that would be difficult or impossible to address without accurate historical smoking data for individuals in the study population. For other cancers, however, such as those of childhood, established risk factors that include specific genetic syndromes, prenatal exposure to ionizing radiation, infections, and demographic characteristics such as race/ethnicity, gender, and high birth weight collectively can explain only a small fraction of cases. Defining Exposure: Lessons Learned from Past Radiation Epidemiologic Studies

With the possible exception of purely spatial or purely temporal cluster studies, all environmental epidemiologic studies require some assessment of “exposure” to individuals or groups. This exposure is hypothetical and is used in a general sense (rather than specifically defined by radiation quantity) and could include simply categorizing study subjects into levels based on exposure surrogates as defined below. For studies of cancer in populations near nuclear facilities, there are many different options for exposure classification, ranging from simple proximity of residence at time of diagnosis to the facility to modeled dispersion of reported releases, but “exposure” in such studies has never included detailed personal measurement of radiation for every individual (as it does in occupational radiation monitoring). For details on the studies discussed here, see Appendix A.

Table 4.2 lists several definitions of exposure in the literature of radiation epidemiology on health risks of populations living near nuclear facilities. Using examples, the definitions are ranked from a less-defined to a better-defined characterization of exposure. The particular type of exposure used in the design and associated analysis defines the question(s) under study and provides an essential context for interpreting the results of any epidemiologic study. It is obvious that a study with well-defined, accurate exposure data can contribute the most to our understanding of the cancer-associated effects of radiation in the setting examined.

TABLE 4.2. Definition of Exposure in Selected Epidemiologic Studies.


Definition of Exposure in Selected Epidemiologic Studies.

The national study conducted by the National Cancer Institute (NCI) and published in 1990 (Jablon et al., 1990; 1991) defined exposure as living in a county in which nuclear facilities are located. This definition is loose because—as pointed out by the investigators—many counties, especially in the West, are large and some are more than 80 km (50 miles) in diameter. For example, the San Onofre plant in San Diego County is located about 60 km (40 miles) from San Diego center. If there was indeed a risk associated with living near the San Onofre plant but the risk is limited to persons living in close proximity to the plant (say, 5 km), the effect would be impossible to detect in a county-based study. This is because the normal cancer rates in the large distant population in San Diego city would dominate the summary statistics for the count and dilute any local effect that might be there (Jablon et al., 1990).

An improvement to the 1990 NCI approach is that used in a study in France. Established zones of 20-km radius centered on the nuclear facilities, further subdivided into 0-5, 5-10, 10-15, and 15-20 km zones were used for analysis of cancer incidence in populations residing near the facilities (White-Koning et al., 2004). The German Kinderkrebs in der Umgebung von Kernkraftwerken (KiKK) study used distance of the family’s place of residence from the chimney of the nearest nuclear power plant to define exposure. The distance measurements were established with a precision of about 25 m, although the investigators primarily used and highlighted a distance of ?5 km for analysis (Kaatsch et al., 2008). An isotropic distribution of discharges was assumed (i.e., circular rings of equal exposure around the plant); a more accurate method would model releases according to local topography, wind direction, and precipitation.

More graduated rank-order measures of closeness were employed in a British study, using the distance of centroids of census wards from nuclear power plants to define several different types of distance scores as continuous exposure variables. No associations were observed to suggest increasing risk in relation to closer proximity to the plants (Bithell et al., 2008). A recent study in Switzerland (Spycher et al., 2011) also used distance of the family’s place of residence (current or at birth of the index child) to the nearest nuclear power plant as a measure of exposure. Although no doses were actually estimated, an analysis was performed accounting for main dispersal directions of airborne emissions from the nuclear power plants. For this analysis, investigators redefined the exposure as living in a zone around a nuclear power plant that is equivalent in area to a circle with 5-km radius but extends to a distance proportional to the average duration of slow winds (<3 m/s) in a given direction (Spycher et al., 2011). Downwind concentration of radioactive particles has been found to be inversely correlated to wind speed.

Evrard et al. (2006) conducted a study using geographic zoning based on doses to the bone marrow estimated due to gaseous radioactive discharges using radionuclide discharge data, local climate data, and a mathematical model of nuclide transfers in the environment. The model was developed by the National Institute of Radiological Protection and Nuclear Safety in France (Morin and Backe, 2002). This ecologic study examined communes (small administrative divisions) located within a 40-km circle around the nuclear facilities in France. The communes were divided into five categories based on the estimated dose. The investigators noted that the categories defined by dose assessments differed from those defined by concentric circles around the facilities due to topographic and meteorological characteristics. Although the estimated doses and distances were significantly and inversely correlated (Spearman’s rank correlation coefficient r = –0.58, p = 10–4), marked variability in the estimated dose within each concentric band remained. The contrast in the mean dose between the lowest and highest dose-based categories (range: 2.11 mSv/yr; ratio: 106) was much larger than the maximum contrast between the concentric bands 0-5 and 15-20 km (range: 1.16 mSv/yr; ratio: 30) (Evrard et al., 2006). This suggests that dose precision and probably statistical power are lost by using only crude distance-based surrogates for exposure levels.

The same model to estimate bone marrow doses associated with gaseous discharges from nuclear power plants was used in the recent investigation. This investigation further considered the risks around nuclear power plants in France and included a case-control analysis which had an ecologic element (Sermage-Faure et al., 2012): cases and controls were assigned a single exposure value estimated at the town hall of the commune of residence.

A study in Spain performed historical reconstruction of the exposure of the population in municipalities within a 30-km zone from the nuclear facilities or 50-100 km from the facilities as a result of the discharges of liquid and gaseous effluents from the facilities (Nuclear Safety Council and the Carlos III Institute of Health, 2009). Estimated effective dose of the populations of municipalities were reported. The investigators state that upon consultation with the International Commission on Radiological Protection, use of effective dose as an indicator of exposure (created for protective purposes and not for estimation of risk) instead of absorbed doses in individual organs and tissues was deemed acceptable for the epi-demiologic study, provided that the uncertainties and limitations involved were clearly stated.

As demonstrated above, studies of cancer risks near nuclear facilities use differing estimates of exposure and commonly suffer from several weaknesses by not accounting for:


Prevailing wind directions and speeds or terrain factors, which may appreciably alter exposures to gaseous effluents.


Directionality and distance of exposures resulting from liquid effluents, the pathways for which may be narrowly focused geographically.


Differences in historic release levels of nuclear facilities, when the pure proximity approach is used and multiple sites are examined.


Temporal cumulative exposures or increases in nuclear facility–associated disease risks as the cumulative exposure increases.


Temporal and spatial variations in natural background radiation in the vicinity of each site as well as from site to site.

In principle, the pure proximity approaches of any study can be improved by incorporating dosimetry information into the risk analyses. Comparison of the study findings regarding the risks in a population using a pure proximity approach to those from an analysis that incorporates reconstruction of the doses received by the same population can prove informative. An example is the recent study in France that showed that children living within 5 km of nuclear plants are twice as likely to develop leukemia compared to those living farther away from the plants. However, analysis of the same population of children using a dose-based geographic zoning approach, instead of distance, did not support the findings. The absence of an association with the dose-based geographic zoning approach may indicate that the observed association of distance and cancer risk may be due to factors other than the releases from the nuclear power plants (Sermage-Faure et al., 2012). Dosimetry Models for a Geographic Unit or Individuals

Dosimetry models for a geographic unit apply to ecologic studies, where an average exposure is assigned to a population residing in an area (for example, census tract) and every individual in that area is assumed to have experienced this exposure; typically, the smaller the geographic unit the less heterogeneity in exposure per individual, and the more precise the estimated exposure of the populations within that unit. Dosimetry information that takes into account the magnitude and temporal variations of annual releases and the factors that provide directionality and distance variations to those releases provide more accurate estimations of exposure. Operationally, for each geographic unit, an areal centroid can be calculated using Geographic Information Systems (GIS), and the estimated annual organ doses to representative individuals at that centroid point can be calculated. Either the population-weighted centroid or the geographic centroid can be used, depending on whether or not investigators want to adjust for a heterogeneous distribution of people within a given census area. One could use those imputed values in dose-response analyses of health outcomes, including appropriate summations of cumulative radiation dose specific to time, lag times, and age truncation.

The same methodology could be used to estimate the doses received by the individuals in a record-linkage-based case-control or cohort study. This implies that each individual is assigned the calculated dose for the census tract within which he or she resides. This leads to loss of statistical power compared to a study in which individual doses are assigned since variability in true dose is underestimated.

It is preferred, when possible, to calculate individual doses based on residential address at the time when exposure is likely to be most relevant, such as residence at time of birth for the cases and controls. Calculating individual doses based on the address where the person lived at time of cancer diagnosis may also be relevant to where the person may have lived at time of exposure and likely more relevant than calculating doses based on residence at time of death. An analysis based on residence at time of death is the most likely to be affected by migration bias.

Individual dose reconstruction for members of a large case-control or cohort study could be time consuming, especially when the investigator wants to incorporate information on residential history of each individual if this is available through interviews or questionnaires. Information on the approaches for modeling dosimetry data in geographic units is described in detail in Chapter 3. Statistical Power

Statistical power is the probability that a study of a specified size and design can detect a predetermined difference in risk in the absence of significant bias, when such a difference actually exists. While the computations can be complex, the concept is simple; higher power to detect effects is better, and if power is too low, a study is unlikely to find a difference of interest even when it actually exists, meaning the study can be shown to be uninformative before it starts and perhaps is not worth undertaking. Thus, a fundamental issue regarding the estimation of risks from low-dose studies is statistical in nature.

The sample size required to detect a significant association between dose and an effect is a function of the inverse variance of the dose distribution. In general, as the variance of the distribution of doses increases, the required sample size to detect a particular effect decreases proportionately. This implies that the required sample size (for the exposed group) varies approximately as the inverse of the square of the expected effect size (i.e., N = k / (Effect size)2, where k is some constant).

To illustrate this, consider the simple case where there is an exposed group, all with approximately the same degree of exposure, and a very large unexposed group for comparison, and one wished to determine whether there was a difference between the groups in the rate of colon cancer. In this case, variation in the sample size requirements in proportion to the inverse variance of the dose distribution implies that the needed sample size to achieve adequate statistical power (80 percent power is usually taken as adequate statistical power) to see a difference between the two groups varies approximately as the inverse square of the mean dose in the exposed group if the dose-response association is linear. For a hypothetical example, suppose the association between radiation dose and colon cancer risk is linear, and observation of 500 exposed persons for a given period of time compared to a very large unexposed group is needed to have adequate statistical power to detect a radiation-associated colon cancer risk when the mean dose is 0.5 Sv. In the analogue of that scenario, 100 times as many (i.e., 50,000) exposed persons would be required to detect a risk if the mean dose were instead one-tenth as large (i.e., 0.05 Sv), and 5,000,000 exposed persons would be needed if the mean dose were 0.005 Sv. This is graphically illustrated in Figure 4.2, where dose (mGy) versus the required sample size is plotted (Brenner et al., 2003). For doses equivalent to those received by individuals that live near a nuclear power plant in the United States which are estimated to be <0.01 mSv/yr (USEPA, 2007) the numbers of exposed persons required to find a possible association would be truly enormous.

FIGURE 4.2. Size of a cohort exposed to different radiation doses, which would be required to detect a statistically significant increase in cancer mortality in that cohort, assuming lifetime follow-up.


Size of a cohort exposed to different radiation doses, which would be required to detect a statistically significant increase in cancer mortality in that cohort, assuming lifetime follow-up. SOURCE: Brenner et al. (2003).

Having a range of doses tends to increase the dose variance, so a dose-response analysis would probably have somewhat better statistical power than the simple two-group comparison; but given the typically high correlation between the dose variance and the mean dose in the exposed group, the “inverse square of mean dose” relationship is still a rough rule of thumb that is easier to ascertain and conceptualize than the size of the dose variance.

Instead of statistical power to detect an effect, an investigator may want to set bounds on the magnitude of risk. In that case, two different purposes need to be distinguished:


If the interest is to establish narrow bounds (i.e., narrow confidence intervals) on the magnitude of risk per unit dose, then a principle similar to that for mean dose and statistical power would apply— namely, a much larger sample size would be required to achieve a given tightness of the bounds on risk per unit dose when the doses are smaller.


If the interest instead is to “rule out” a certain magnitude of risk (for example, a 20 percent increase in risk in the exposed group) without reference to their estimated dose levels, then sample size calculations associated with finding a detectable risk per unit dose do not apply. Instead, the calculations involve an estimation of likely confidence bounds given the sample size and anticipated number of cases of the disease (Satten and Kupper, 1990). The latter is usually determined using available disease rates.

This second purpose, that is, to “rule out” a certain magnitude of risk, is how the committee based its power calculations. The committee’s aim was to establish the minimum sample size required so that the investigation is reasonably likely to detect an effect of a given magnitude. A 20 percent increase in risk was used as a rough figure that would raise the level of concern in statistical terms (but other alternative scenarios of higher risks are also considered). Similarly, power calculations can be used to calculate the minimum magnitude of the change of risk that can be detected given a particular sample size.

To reiterate, calculations of required sample sizes based on current knowledge of the average population exposure of the people in the United States to radiation from the nuclear industry would lead to a small anticipated increase in risk that would require an enormous population size to detect with statistical precision. Even for leukemia, which is considered the most radiosensitive cancer, the expected increase in risk is small. The committee discussed that in the atomic bomb study the relative risk for leukemia was 5.3/Sv dose at age 70 after exposure at age 30. This means that the excess relative risk for leukemia is 4.3/Sv, which is equated to 1.43/100 mSv, 0.143/10mSv, or 0.0143 for 1 mSv. Therefore, the estimate of excess risk that one would be trying to detect in relation to exposures from nuclear facilities would be on the order of 0.000143 or smaller. Such a risk would be virtually impossible to detect for any cancer given the statistical and other variability on the baseline risk. As a result, precise computations of statistical power based on risks due to the expected doses would have little meaning; therefore, computations of statistical power are focused on the population sizes required to “rule out” larger risks. Arguably, the power calculations presented here are based on risks tied to exposures that are on the order of 0.5-1.0 Sv, which are much higher than those expected from the releases of nuclear facilities.

On the basis of demographic parameters specified by the committee (U.S. population in 2010 of approximately 300 million, about 15 percent live within 50 km [approximately 30 miles] and 0.3 percent live within 8 km [approximately 5 miles] of a nuclear facility, about 20 percent are children under 15 years of age), the committee calculated the power of several possible scenarios that apply to different study designs using distance from a site as a surrogate exposure measure. The choices of 8- and 50-km comparison zones are used solely to provide a frame of reference for the sample sizes required for adequate performance of an epidemiologic study. These reference scenarios are in general agreement with some published studies (see Table A.2), although often the “at-risk zone” in many of these studies is designed to be slightly closer to the facility (for example, 5 km). As described later in this section a gradient type of analysis rather than an analysis based on two categories is preferred.

The scenarios explored are the following: a case-control study with equal number of cases and matched controls (1:1 matching plan), a case-control study with 5 controls per case (1:5), and a case-control study with 100 controls per case (1:100). The latter could approximate the matching ratio of cases and controls of a large cohort study or an ecologic study; as is generally true for rare diseases, far more controls are available than cases in these two study designs.

For purposes of this discussion, risk estimations for the different scenarios are presented as relative risks (RR). The odds ratio (OR) calculated for case-control studies (see Sidebar A.1 in Appendix A) approximates the RR from a cohort study when rare diseases are examined. Reporting power calculations based on RR provides a more conservative assessment of power.

In these comparisons, the committee made several simplifying assumptions about the relationship between exposure and distance. The committee assumes that:


Distance to the nearest facility is classified into just two catego ries, for example, living within the 8-km zone (nearest category/exposed) versus living within the 8-50-km zone (farthest and larger category/unexposed) from the nuclear facility.


Two and one half percent of the population under study is in the exposed category and 97.5 percent in the unexposed category.


Risk in the exposed category is equal to RR × (baseline risk), where RR is relative risk due to being close to the nuclear facility and baseline risk is the risk in the unexposed category.


National rates provide the rates of cancer for the unexposed population in the regions under study.


Distribution of risk factors other than the exposure of interest is nondifferential between the two categories.

These assumptions need to be refined if a study is in fact undertaken.

Figure 4.3 plots detectable RR as a function of total number, n, of cases for each of the three matching scenarios (1:1, 1:5, 1:100). Detectable RR is defined to be the ratio of risk in the exposed category compared to the unexposed category, for which a study with a given number of cases, n, will have 80 percent power (usually taken as adequate statistical power) to detect the increase at the 5 percent level of significance (one-sided test; see Sidebar A.1 in Appendix A for definition).

FIGURE 4.3. Detectable relative risk for a case-control study with 2.5 percent of subjects exposed.


Detectable relative risk for a case-control study with 2.5 percent of subjects exposed.

The detection of RRs that are equal to 1.2 (a 20 percent increase in risk in the 2.5 percent of the study population nearest a facility) with acceptable power (80 percent power) requires that 7,000 to 14,000 cases be recruited (depending on the matching scenario). A 40 percent risk increase can be detected with about 3,800 cases for a 1:1 case-control study and about 1,800 with a case control or a cohort and ecologic study designs of 1:100 matching. Doubling of risk (RR = 2) can be detected with approximately 765 cases and controls for a 1:1 matched case-control study and with about 345 cases with a case control or a cohort and ecologic study designs of 1:100 matching (see Table 4.3 for summary).

TABLE 4.3. Approximate Number of Total Cases and Years of Study Follow-Up.


Approximate Number of Total Cases and Years of Study Follow-Up.

For rare cancers such as childhood leukemia where the observed number of exposed cases will be relatively small, multiple controls (for example, 5 per case) would help to increase the power of the study. However, the improvements diminish rapidly as the number of controls per case increases, so that 5 compared to 100 controls per case do not increase substantially the power to detect an increase in risk (see Figure 4.2).

Another consideration for the design of the study is the number of years of study needed to accrue enough exposed cases so that the study achieves 80 percent power to detect a 20 percent increase in risk of childhood leukemia among the “exposed.” From Figure 4.3, a 1:1 matched case-control study would require about 14,000 cases within the overall study zone in order to have power to detect a 20 percent increase in risk. There are approximately 3,000 childhood acute lymphoblastic leukemia cases diagnosed per year in the entire United States (http://www.cancer.gov/cancertopics/pdq/treatment/childALL/HealthProfessional), 15 percent of which (450) would be in the study zone (50 km from a nuclear facility). Therefore, it would require 31 years of accrual before a study would reach acceptable power. Increasing the number of controls from 1:1 to 1:100 (as in a cohort or an ecologic study) would reduce the needed number of cases to roughly 18 years of accrual. Of course more extreme risks are detectable with much less study accrual time. For example, a doubling of risk could be detected with 350-765 cases or about <1 to 1.7 years of accrual for the 1:100 to 1:1 matched studies. A 40 percent increase in risk could be detected with 4 to 8 years of accrual for the 1:100 to 1:1 matched studies (see Table 4.3 for summary).

For most adult cancers the period of accrual required to detect relative-risk increases of these magnitudes is much shorter because of the higher prevalence of disease and the larger population numbers. For example, for breast cancer in women under 50 years of age the national rates are approximately 43/100,000 person-years or about 40,000 women diagnosed per year. Since approximately 15 percent of these women (6,000) are expected to live within 50 km of nuclear facilities this means that it would take around 1-2 years of follow-up to detect an excess risk of 20 percent for this cancer, under the same assumptions as above.

The total number of cases and years of follow-up required for the different matching scenarios to detect a range of increases in risk following the assumptions stated above are summarized in Table 4.3.

The sample size computations provided here are the bare minimum of data to test the hypotheses at the specified level; thus, a sample size estimate is generally a lower bound on what will be needed, and actual requirements could be much larger. This is because the power calculations presented here are based on simplified models that ignore the effect of other risk factors that are largely unknown at the design stage. Internal pilot data are often used to better inform the power calculations and more reliably estimate the required sample size. Pilot data can account for the patterns of risk factors and potential confounders (if information is available) and the nature of confounding—whether it is positively or negatively associated with the exposure. Power calculations that have not accounted for the effects of risk factors may under- or overestimate the required sample size.

Modest improvements in the statistical power can be achieved by examining dose-response gradients, especially when the population under study is exposed to a range of doses (Shore et al., 1992). However, since the mean doses received by the populations near nuclear facilities are expected to be low and the associated risks, if any, are expected to be small, very large numbers of cases and controls would still be required in order for the study to be informative and useful. If the study intends to examine dose-effect relationships, improving the quality of the dosimetry can also afford gains in statistical power. Imprecise estimation of doses can be a source of error that increases the uncertainty in the estimated association, which tends to flatten the dose response and decrease the likelihood of finding a statistically significant association.

One way to improve statistical power is to increase the effective sample size. As the time since onset of exposures increases, the follow-up number of the exposed populations increases and the exposed population becomes older. Both of those serve to increase the statistical power to observe potentially elevated risks, the latter because much of a population’s cancer risk is expressed at older ages as the disease rates increase. An additional method to increase sample size is to pool data across numerous studies or study sites. Bias, on the other hand, is not reduced by simply increasing sample size in the absence of other improvements; if larger samples mean that less attention can be given to quality of the individual observations, bias may even increase with sample size.

Another way to achieve a more statistically powerful study is to focus on radiation-sensitive end points, that is, those that have shown the largest association with radiation. Leukemia (except for chronic lymphocytic leukemia) has shown the highest radiation relative risks per unit dose of any malignancy in a number of studies, so it is a natural target for study. Other endpoints that show relatively high radiation relative risks are breast cancer in younger women, thyroid cancer in children, and bladder cancer. In mounting a study with an exposed group of a certain size, however, there may be a trade-off between the size of the relative risk and the baseline frequency of the disease in question. If a disease is very rare, even with a high relative risk there may not be enough disease cases to demonstrate an association. On the other hand, with a common disease a relatively low elevation in relative risk may be sufficient for statistical significance.

Another strategy to increase statistical power is to concentrate on a “sensitive” subgroup of the population, that is, a subgroup for whom any radiation-associated relative risk may be appreciably higher than for the population as a whole. Efforts are ongoing to try to identify genetically susceptible subgroups of the population and—not surprisingly—research indicates that the DNA repair and cell cycle control pathways may play an important role. To date, however, either the genetic variants are too rare to be studied separately (e.g., in the BRCA1 and BRCA2 genes; women carriers of mutations in these genes are at high risk of developing breast cancer) or to have much impact in general-population studies (Bernstein et al., 2010), or the susceptibility variants show only small elevations in risk and frequently are not replicable. A recent study that examined a set of genetic variants (haplotype approach), as opposed to each variant separately, showed that the risk of acute lymphoblastic leukemia associated with diagnostic irradiation is modified by variants in DNA repair genes (Chokkalingam et al., 2011). The WECARE7 study is examining the interaction between radiation exposure and genetic susceptibility in the etiology of second breast cancer in women with radiation treatment for an initial breast cancer. For genetic sensitivity variables, thus far mostly only rather rare mutations have shown an appreciably heightened radiation effect, which means the number with such mutations among cancer cases nearby to nuclear sites would be very small and not promising for a study (Bernstein et al., 2010; Malone et al., 2010).

One sensitive subgroup clearly needs to be considered. A substantial amount of data supports the concept of greater radiation cancer risks after exposure in childhood than after exposure in adulthood. For example, the Japanese atomic bombing survivors data suggest this age differential for cancer mortality or incidence for total solid cancer, leukemia, and cancers of the stomach, breast, colon, bladder, thyroid, skin (nonmelanoma), and a combined miscellany of other sites (Preston et al., 2003, 2007; Richardson et al., 2009). For total solid cancer and a number of the individual sites, the radiation relative risks are roughly 1.5 to 2 times greater for childhood exposures than adult exposures. For leukemia, thyroid cancer, and breast cancer the ratios of relative risks by age at exposure are even larger. In contrast to an investigation that focuses on exposure of genetically susceptible individuals, a study on childhood exposure would affect a significant proportion of the potential study population and therefore has good potential for a study (or for a focus within a broader study).

Since the risk of leukemia after radiation exposure at young ages is so pronounced for the first 15-20 years after exposure (Figure 4.1) (Richardson et al., 2009), a study focusing on those with potential exposure who develop leukemia at an early age (e.g., before age 15) might be a relatively powerful study if the doses are high enough. The 0-14 age group has been the target age group for many international studies (see Table A.2, Appendix A). The Multiple Comparison Problem

The design of an epidemiologic study of cancer risks around nuclear facilities may include one or few a priori hypotheses to be tested. For example, an epidemiologic hypothesis may be that cancer (all types together or a specific type) occurs more often in populations that live near nuclear facilities than in populations that live further away. Stating the hypothesis precisely, with the method that will be used to test it, is important not only for the collection of the appropriate information, but also because standard statistical techniques require that each tested hypothesis be prespecified; otherwise statistical measures such as p values and confidence intervals lose much of their scientific meaning and become hard to interpret. Statistical issues aside, asking “Does this study yield any associations?” is a poor research strategy (Savitz and Olshan, 1995).

If a study has low statistical power and only a small number of disease outcomes is examined (i.e., only a small number of a priori statistical tests is performed), then null (negative) results would be the most likely outcome of those statistical tests. However, when a considerable number of different disease outcomes will be examined, the potential for one or more false-positive results (purely by chance) can become large. If two sets of statistically independent observations are available, each is testing a true null hypothesis, and each is tested at the usual 5 percent level, the probability that the first will be found significant is 5 percent and the same for the second. The probability that at least one will be significant by chance is (1 – 0.95 × 0.95) × 100 = 9.75 percent, almost twice the probability for either test alone. The probability increases further if there are more than two hypotheses. For instance, for independent disease outcomes the probabilities of at least one false-positive result when 10, 20, or 30 outcomes are examined are about 40, 64, and 79 percent, respectively, while the respective probabilities of at least two false-positive results are 9, 26, and 45 percent.

In other words, the probability of one of many prior hypotheses yielding false-positive results increases with the number of hypotheses tested. Furthermore, when investigators also examine risks in various subsets of the data (e.g., dose, time, or age subgroups), this also will tend to increase the probability of false-positive findings, especially if particular subsets are chosen because of preliminary inspection of the data to identify “suspected differences.”

With a substantially underpowered study, any “positive” finding usually has two characteristics. First, it is likely to be a false-positive finding. Second, it is likely that the risk estimate associated with that positive finding is a large overestimate of the “true” degree of risk (Land, 1980). This can be understood intuitively with a hypothetical, but possible, example. Suppose that, given the mean dose in some underpowered low-dose study, the expected true RRs for a series of health outcomes were about 1.1. However, because of the sample size, the RR would have to be about 2.0 to be likely to be detected as statistically significant. Due to sampling variability, by chance one out of the number of health outcomes might show a “statistically significant” RR of 2.0. The excess for the RR of 2.0 is on the order of 10 times larger than the true excess (that is by chance, an excess of 100 percent when the “true” excess is about 10 percent). In short, “statistically significant” results in low-dose studies where the true risk is small tend to provide falsely exaggerated estimates of risk. Accompanying that is often the common human tendency to focus on the “statistically significant” risks, which means that the false-positive results with large imputed risks get undue attention.

The multiple comparison issue would be particularly limiting in the interpretation of the results of an ecologic study in which multiple cancers are examined for individual facilities as well as combinations of facilities, different time periods, and different age groups. Positive associations found by chance are likely to be misinterpreted. In the 1990 NCI study, for example (Jablon et al., 1990, 1991), 3,090 comparisons were made for leukemia after startup of a nuclear facility for different areas and age groups. Nineteen were expected to have a probability below 0.05 by chance alone; the actual number observed was 18.

Statistical scientists have various ways of dealing with the multiple comparison problem. One strategy that is sometimes employed to guard against excessive false-positive (i.e., “chance”) outcomes is to use a more stringent level for declaring that some difference is statistically significant. Two such commonly used procedures are the Bonferroni multiple comparison correction and the Benjamini and Hochberg method. However, increasing the stringency for declaring a statistical test as positive has the downside of decreasing the statistical power to detect a real effect. Another way is to examine the number of significant results and look for patterns in them (such as increases in cancer only around a certain type of facility, or in one type of cancer around a number of facilities). A third way is to reexamine the results of the significant tests, perhaps in light of additional data, to see whether there is reason to suspect a real effect. For example, was there a radionuclide released that tends to be carcinogenic to a certain organ, as in the case of radiostrontium and bone cancer? Is the association consistent with other studies of radiation effects and biological plausibility? For example, is an association for female breast cancer more plausible than one for male prostate cancer? None of these, applied in a mechanical fashion provide a sure procedure to distinguish real effects from chance (false-positive) associations, and in the end scientific judgment has to be applied based on such considerations as strength of the study methodology, ability to rule out biases and confounding, and biological plausibility. Confounding

Confounding refers to an apparent change in the magnitude of the association between the exposure (e.g., radiation) and some outcome (e.g., lung cancer) that comes about because of associations with a third, “confounding” variable. Confounding variables might be exposures to toxic or preventive agents, lifestyle or dietary variables, or other disease risk factors. An important statistical concept regarding confounding is that the degree of confounding of the exposure-outcome association depends on the degrees of association of the potential confounder variable with both the exposure and the outcome, as well as the strength of the exposure-outcome relationship.

The term “confounding” is frequently used without careful consideration of the true definition to describe the differential distribution of characteristics of the groups under study (for example, between cases and controls, exposed and unexposed). So, for example, if there is an empirical association between the potential confounder and the outcome, but no association between the potential confounder and the exposure, there will be no confounding. Likewise, an association of the potential confounder with the exposure but not with the outcome will mean there is no confounding. (In actual studies it is typically not an all-or-none situation, but a matter of degree, depending on the magnitude of correlations of the confounder variable with the exposure and outcome variables.)

Issues of confounding are important in all epidemiologic studies with no exception, and they are particularly important in low-dose radiation studies that examine rare diseases, as even a small degree of confounding can distort the study results substantially and produce incorrect results. An observed small relative risk such as 1.2 (a 20 percent increase in risk) is more likely to be a result of methodological flaws than a relative risk of 5 (fivefold increase in risk). Confounding can create erroneous risk estimates that either exaggerate or nullify the true degree of association. Studies of health effects associated with high levels of radiation exposure usually are not affected by major confounders, because confounding by other exposures or risk factors tends to be considerably smaller than the radiation effects in question. However, with low-dose studies in which the size of the radiation effect is expected to be small, the magnitude of potential confounding effects may be as large, or larger than the size of the radiation effect. In that circumstance, there is a potential for a substantial degree of confounding of the exposure effect. Insofar as studies do not have information with which to evaluate particular variables that might be confounders, potential confounding is a source of uncertainty that can make low-dose study effects difficult to interpret. When information on the potential con-founders is available, adjustment8 for them can be made in the statistical analysis to help remove their effects.

Smoking is an example of a serious possible confounder for lung cancer because of the very strong causal relationship between smoking and lung cancer. (Smoking can also be a confounder for other cancers such as bladder cancer.) Small differences in smoking habits can have a greater influence on lung cancer risks than do differences in exposure to low levels of radiation; the relative risk of lung cancer associated with cigarette smoking for moderate to heavy smokers generally exceeds 10, while the RR associated with exposure to high doses of radiation rarely exceeds 2 (Pierce et al., 2005). Therefore, collecting detailed information on the individuals’ smoking history (number of cigarettes smoked per day, age of smoking initiation, years of smoking) is crucial as even slight variations in smoking patterns can bias the results. If the information is not available, it is almost impossible to determine that radiation exposure increases one’s risk of developing lung cancer even if data suggest that.

An ecologic study that uses aggregate health survey data on smoking is not expected to provide adequate adjustment for potential confounding by smoking because it is unable to capture specific smoking patterns or the complicated interactions between smoking and socioeconomic factors. This inability of ecologic studies to properly adjust for confounding often leads to hesitation of the scientific community to embrace results and outcomes of these studies. An example already discussed is the large county-based ecologic study in which a decrease in lung cancer mortality was observed in association with increased radon exposure in sharp contrast to the increase expected from current knowledge (Cohen, 1995, 1997). Subsequent investigators who reviewed the data were skeptical as to whether confounding by smoking was properly adjusted for (Heath et al., 2004; Pawel et al., 2005). Indeed, a series of studies using estimated individuals’ radon exposure have shown positive associations (Darby et al., 2005).

If the likely confounders have been measured in the study, one way to control for confounding in the design stage is to match9 on one or more factors about which the investigator is concerned that would distort or confound the relationship between exposure and disease under study. Matching has been defined as “the process of making a study group and a comparison group comparable with respect to extraneous factors” (Last, 1995). This way, there will be identical confounder distributions among cases and controls or exposed and unexposed groups. Matching is more often used in case-control than in cohort studies and can occur at the level of the group and is then called group or frequency matching or at the individual level and is called individual or paired matching.

Although matching for factors may appear to be a tempting way of controlling confounding, adjusting for confounders inappropriately can result in “overmatching.” Overmatching can occur when investigators match for a variable that is correlated with the exposure of interest or is connected with the mechanism whereby that exposure affects the disease under study.

If the confounding factors have not been measured, the data may be misleading and findings need to be interpreted with caution. If a confounder is measured imperfectly due to missing information, classification of the confounder is too broad, or the confounder is misclassified, confounding may still exist, and it is termed residual confounding. Uncertainties

A valuable strength of an epidemiologic investigation of cancer risks that incorporates dose reconstruction stems from the fact that the population of interest is examined directly for cancer occurrence or death from cancer; no extrapolations are required from other human populations exposed to high doses, or acute doses, or from animal or cell studies that would add various uncertainties in the risk estimations. (The risk projection model described in Section 4.2.2 is not considered to be an epidemiologic investigation.) Still, any of the study designs considered would attempt to demonstrate very small radiation effects, if any, associated with low doses, and would deal with particularly challenging problems related to uncertainty from various sources. These sources are more often discussed in the context of dose estimations (presented in Chapter 3) and include inaccuracy of measurements used to reconstruct radiation doses, lack of knowledge about true values of dosimetric parameters, and inappropriate assumptions in dosimetric models used to calculate radiation doses to the populations under study. Uncertainty related to the epidemiologic study design itself is often discussed in terms of limitations of the design, analysis, and subsequent interpretation of the findings.

Almost any conceivable epidemiologic study must base its analysis on incomplete or imperfect information regarding the population under investigation. Furthermore, some potentially incorrect assumptions, small or large, will be needed, for example, because data are not available or because clarifying the assumptions is not possible. The unknown effects of the necessary assumptions made in analysis contribute to uncertainties in the results. In this section uncertainties are discussed in terms of:


Completeness of cancer case ascertainment. Cancer risk estimates are based on disease rates obtained from cancer registries and vital statistics offices. Although well-organized means of assessing the quality of cancer registration are in place, at least for the more recent years (see Section 4.3.2), registration is not 100 percent complete or free of errors such as diagnosis misclassification. However, if the frequency of these errors is not large, and not different in exposed versus unexposed areas, the random misclassification should have little effect on the identification of any increased risk.


Population mobility. Inability to retrieve information on residential history and duration of residence at each location is a major source of uncertainty in the epidemiologic investigation of cancer risks near nuclear facilities. In most such studies investigators estimate the exposure of the individuals or the populations based on one time point: place at time of diagnosis, or at time of death (and the equivalent for controls), or at time of birth. The assumption is that the exposures relevant to the disease occurred while living at that location and that individuals remained at the location of exposure for the period of interest. The issue with this assumption is not only that is likely not true, but also that the results of the study are sensitive to the driving forces that cause people to migrate. Social and economic factors (such as education, job opportunities, and housing) often drive migration and also affect disease outcomes. If migration patterns differ between cases and controls (or between exposed and nonexposed), then the results from the study could be biased.

Although it may be possible to quantify the uncertainty introduced by in- or out-migration, exposure from the releases of the nuclear facilities may not be relevant to place of residence but more to place of employment for the adult working population. As an example, take a person that lives 60 km away from a nuclear facility (outside the zone of interest of 50 km that has been discussed in this report) but works 10 km from a nuclear facility or in a nuclear facility. This exposure misclassification is impossible to capture without enquiring detailed information on both residential and employment history through interviews and questionnaires.

A study of young children (for example, 0-14 years of age) is likely the least affected by the issues related to migration and/ or place of exposure misclassification. Young children would not only have less opportunity to migrate, but they would also tend to spend more of their time at home compared to adults whose work or other activities may be taking them elsewhere. Additionally, a study of young children where analysis is based on birthplace (rather than place of diagnosis or death and the equivalent for the controls) could capture exposures of the child’s early life and exposures of the fetus during pregnancy, two periods during which humans are particularly sensitive to the effects of ionizing radiation (Pierce et al., 1996). This said, studies of young children are not immune to the impact of mobility or exposure misclassification. A surprising number of families move during pregnancy (Fell et al., 2004) and more than 50 percent of children ages 3-6 are enrolled in center-based care (http://www​.childstats​.gov/americaschildren/famsoc3.asp). Arguably, a study of the cancer risks of populations near nuclear facilities (especially of the older populations) that is based on place of death is more affected by migration bias. There are, however, good reasons to perform combined analyses of mortality and incidence for reasons described in Section 4.2.1.


Variability in risk factors. There is inherent variability in the characteristics of the populations in an epidemiologic study that include variability in their genetic make-up, susceptibility to cancer, lifestyle factors, and personal habits. These factors are not easily measurable even if detailed interviews are conducted and/or biological samples are taken. In a low-dose epidemiologic study, the magnitude of the variation in these unmeasured factors may surpass the expected effect from radiation released by the nuclear facilities and therefore obscure any actual effect attributed to the radiation. The variability in population characteristics would not have as profound of an effect in a high-radiation-dose epidemio-logic study because the excess risk tends to be greater than most variation in the baseline risk.


Inability to distinguish risks from different sources of radiation. Similar to the “noise” on baseline cancer risk that arises from the variability of risk factors such as those discussed above, variability in exposure to other sources of radiation is difficult to measure with accuracy. An increasing source of radiation dose to the population in the United States is from exposure to medical diagnostic procedures, which accounts for almost half of the annual dose that the population receives (NCRP, 2009). In the current context, collecting information on frequency of high-dose procedures such as computed tomography (CT) exams or doses received from these procedures is important as these doses are much higher than those expected to be received from routine operations of the nuclear facilities.10 In the absence of a national system that tracks population utilization and exposure to medical procedures that involve radiation use, retrieving the information on medical imaging utilization is not possible unless medical charts are reviewed or personal interviews are conducted; then the potential for collection of inaccurate information or recall bias is a concern. As the methods to obtain organ dose are not fully developed yet, calculating the doses to the exposed populations per imaging modality, if possible, would introduce additional uncertainty.


Potential confounding. A risk factor such as smoking or exposure to medical diagnostic procedures has to be formally tested to assess whether it is a true confounder or not under specific circumstances. Smoking is of particular interest because as discussed in the previous sections it has the potential to be a serious confounder for lung cancer and other cancers such as bladder cancer. However, it is often not possible to collect accurate and detailed information to fully test for confounding.


Synergistic and antagonistic effects with radiation. Collecting information on lifestyle factors and exposure to agents such as toxic substances is also important for the examination of synergistic or antagonistic effects with radiation. A collaborative multicountry study in Europe aimed to determine the risk of lung cancer associated with exposure to radon at home. Results demonstrated that residential exposure to radon among smokers and recent former smokers increased the risk of lung cancer compared to individu als who did not smoke currently or in the near past (Darby et al., 2005). Similar interactions may exist between radiation and inherent characteristics of the individuals such as genetically based inability to repair damage from the exposure. A review of the literature on the interaction between genetic susceptibility and radiation on cancer risk is presented elsewhere (UNSCEAR, 2006, Appendix A).


Use of proxies. Although proxy measures in general are often accepted indicators of an exposure and can prove informative, there is uncertainty as to whether the exposure of interest has been sufficiently investigated by the use of that proxy. The uncertainty varies with the degree of “closeness” between the proxy and the real measure. For example, high socioeconomic status and educational level are often used as a proxy for a healthier lifestyle and access to health care. Birth order11 and day care use during infancy (Law, 2008) are often used to measure frequency of infection in children. These proxies have been used by a recent study of risks in populations near nuclear facilities (Spycher et al., 2011) to adjust for confounding linked with the “population mixing hypothesis” that has been applied to explain observed leukemia clusters around nuclear facilities in Europe, such as that around Sellafield in Britain (Kinlen, 2011). According to this hypothesis, childhood leukemia is a rare response to common infection, which may be introduced to a previously isolated rural community by sudden in-migration and changes in the dynamics of infectious diseases. Simply, when a population is mixed with another population that has not previously been exposed to the infectious agent (yet to be identified), individuals in the previously unexposed population may develop the disease.


Statistical uncertainty. There are inherent statistical variations in fitting dose-response models. It is important that uncertainties be incorporated properly into risk calculations and be communicated clearly. Interpretation of risk estimates is also based on uncertainties from less than perfect knowledge of the effects of low-level radiation on human health. The value of a study increases if it is performed in the context of existing investigations, and if its results are supported by other studies in the field.

4.2.2. Descriptions of the Study Designs Considered Risk Projection Models

To evaluate the potential cancer risks associated with living near a nuclear facility directly requires very large-scale studies (Land, 1980) and still it would be extremely difficult to estimate the health effects by studying the exposed populations alone. This is because at very low doses, the radiation-related excess risk tends to be buried under the noise created from statistical and other variation in the baseline lifetime risk of cancer which in the population of the United States is estimated to be 42 percent (NRC, 2005). A more timely risk assessment can be obtained using risk-projection models.

Risk-projection models would involve using dose data related to the exposures of individuals living near nuclear facilities and quantifying the risk by transferring that observed in other exposed populations. Data from the Japanese atomic bombing survivors’ cohort are most often used for the purposes of assessing the risks arising from exposure to radiation. This is because this cohort has the most detailed information available for most cancer sites. The models for breast and thyroid cancer are often based on pooled analyses of the Japanese and Western populations such as those that were medically and occupationally exposed (see Appendix A for literature review). These models would calculate a theoretical excess risk of cancer for the populations near the facilities by using the most relevant risk estimates and interpolation models, as well as population characteristics like age structure and population mobility. Then one can produce estimates of changes in risk, or demonstrate that any increase is smaller than some upper limit. If the upper limit is an “acceptable” level, then the true level of risk associated with living near a nuclear facility which by definition is lower than the upper limit is unlikely to be unacceptable (Land, 2002).

Such a method was used to project the cancer risks associated with exposure to radiation from other sources such as the use of CT scans and to assess which age groups were associated with the highest risks (Berrington de González et al., 2009). Organ-specific doses and frequency of CT use were derived from national surveys. The investigators discuss that they used this indirect modeling approach to provide more timely risk projections; otherwise, long-term follow-up of very large populations would be required.

There are limitations associated with the use of risk-projection models to transfer risks from more heavily exposed populations such as the Japanese atomic bombing survivors to the populations in the United States that receive much lower doses estimated from reported releases from each facility to be studied.

First, the baseline cancer rates of the comparison population (i.e., Japanese atomic bombing survivors) are often different from that of the population of interest (i.e., residents around nuclear facilities in the United States), and for a few cancers such as breast and stomach cancer the relationship between radiation-induced and baseline risk may differ (UNSCEAR, 2006, Annex A). For example, the age-adjusted incidence rate for breast cancer is 34 per 100,000 per year for Japanese women and 90 per 100,000 per year for the women in the United States (Parker et al., 2002). Breast cancer has occurred in excess among women survivors of the atomic bombings in Japan and among those exposed over many years to medical radiation in the United States. The excess relative risk of breast cancer incidence in the Japanese atomic bombing survivors, however, is significantly higher than that of medical radiation patients in the study in the United States (Little and Boice, 1999) and the best estimate of the ratio of the excess relative risk coefficients for the Japanese and U.S. cohorts is about 2. However, this higher relative excess risk is attributable to the lower baseline risk of breast cancer among Japanese women compared with the women in the United States. The excess absolute breast cancer risks in the two populations are statistically indistinguishable (Little and Boice, 1999). Related to this difference in baseline cancer rates and the relationship between radiation-induced and baseline risk is the question of whether relative or absolute transfer of risks between populations is the most appropriate (see Sidebar A.1 in Appendix A for discussion on risk measures).

Second, additional assumptions are required in risk-projection modeling, which are major sources of uncertainty: sampling variability in parameter estimates in the risk models; the choice of adjustment factors (known as the dose and dose rate effectiveness factor) to use for interpolation from high-dose-rate exposure to much lower dose rates resulting from prolonged releases; and accounting for differences in relative biological effectiveness between different types of ionizing radiation (known as the radiation effectiveness factors).

As a standalone study, a risk-projection model would provide less information than the other study designs considered by the committee and described below. A serious problem with such a study is one of public credibility: the calculated dose distribution by necessity must be based on the reported release data—which if drastically wrong, would provide misleading results. Simply said, the accuracy of the risk-projection models is entirely dependent on the accuracy of the reporting of the releases.

Noting the concerns above, the committee notes that risk-projection models could provide useful background information in conjunction with the empirical epidemiologic studies discussed in this chapter to provide guidance for dose assessment and to aid in the interpretation of such studies. Ecologic Study

A main reason why investigators may choose to perform an ecologic study rather than an individual-based study is that the necessary data— depending on the level of aggregation—are routinely available from relevant cancer registries and census bureaus. Hence, it is easier and faster to obtain the aggregated data than it is to collect individual data, the release of which from cancer registries and other relevant offices often involves demanding approval procedures. Because of the relative ease of accessing aggregated data (which is highly dependent on the level of aggregation), multiple disease endpoints in a range of age groups can be studied at once. Despite their inherent limitations, ecologic studies based on cancer incidence or mortality data, even those that focus on large geographic areas such as counties, have proved to be of value in suggesting avenues of research. Ecologic studies are considered as “hypothesis generating” investigations and a finding with possible public health impact will require more rigorous testing using a different study design.

As discussed in earlier sections, radiation is associated with elevated risk for a large number of different cancer types and leukemia, female breast, bladder, thyroid, brain, and ovarian cancers are considered the most radiogenic. Given that different segments of the public have concerns about different cancers, an ecologic study that examines the risks associated with a wide range of cancers may be necessary, but particular attention needs to be given to the most radiogenic types. It is important that ecologic studies are conducted using reliable methods and the susceptibility of their research to the ecologic fallacy is clearly described when results are reported. Recent analysis showed that this is often not the case, and the quality and clarity of some publications on ecologic studies is compromised (Dufault and Klar, 2011).

The NCI reported an ecologic study of cancer mortality across all nuclear facilities that began operations prior to 1982 and for cancer incidence for two states (Jablon et al., 1991). For the NCI study, the rates observed in the population living in a county containing a nuclear facility or an adjacent county that contained more than 20 percent of the area within a 16-km radius of a facility (exposed) were compared to the rates observed in counties not containing a nuclear facility (unexposed). For every exposed county, three unexposed counties were selected to match on certain demographic factors: percentages of persons in the population over age 25 that were white, black, American Indian, Hispanic, urban, rural, employed in manufacturing, and high school graduates; mean family income; net migration rate; infant death rate; and population size.

The analysis assumed that populations living closer to a nuclear facility would receive higher doses of radiation. However, no data regarding radiation exposures or measured releases from the facilities were used in the analysis. That is, the NCI study, similar to other studies of proximity, was not a direct study of health effects of radiation released from nuclear facilities, but rather a study of the health effects of the collection of factors differentiating populations residing near the facilities from those farther away. This includes exposure to radiation but can also include the demographics of the nuclear workforce and the population-mixing hypothesis discussed earlier (Kinlen, 2011). This context is important when considering the role of dosimetry based on reported radiation releases and monitored values from nuclear facilities, especially since the reported doses in recent years fall well below exposures that have been directly shown to cause cancer.

The primary analysis in the NCI study compared the ratios of standardized mortality ratios or standardized incidence ratios before and after the date a facility began operation, with the same measures for the matched unexposed counties. Hence, the values were not mutually standardized and are, at best, generic rate ratios. The main focus of the NCI report was on the ratio of pre- and postoperation cancer mortality ratios since appropriate incidence data were only available for two states with long-standing cancer registries (Connecticut and Iowa).

Several changes could be made to update and improve the 1990 NCI study design and analysis. Here we discuss five:


Reduce the size of the geographic units in the analysis.


Use the current nuclear facility inventory.


Include years of mortality and incidence data that are relevant to the years of exposure.


Incorporate estimated exposure levels for each geographic unit.


Use stronger analytic methods that permit direct adjustment for possible confounding variables, and incorporate population mobility and temporal changes in the sociodemographic characteristics of the populations under study.

For the first change, reducing the geographic unit to be considerably smaller in terms of physical size, but also in population, for example, using census tracts, allows for a finer distance-based exposure characterization as well as better characterization of the populations that reside within these units such as age, gender, and race/ethnicity structure, and socioeconomic status. As an example of the magnitude of reduction of the geographic size, the U.S. Census Bureau defined 628 census tracts in San Diego County for 2011. This may be one of the most important of these five ways to improve on the NCI study. This approach would also facilitate analyses of risks at a range of distances. Using smaller geographic units in an ecologic study is also a potential strategy to reduce the impact of the ecologic fallacy. Although groups are rarely completely homogeneous, smaller geographic groups can be more homogenous with respect to the exposure under study and possibly other risk factors and potential confounding factors. The strategy of reducing the size of the geographic unit for analysis to reduce ecologic fallacy can also lead to another problem, greater migration between groups (Rothman and Greenland, 1998).

For the second change, the inventory of the nuclear facilities in the United States has changed substantially since the NCI analysis; therefore, estimated risks associated with facilities in that study may not be relevant to those operating today. Many nuclear facilities have started operations since 198212 (as the total number of currently operating reactors has increased from 80 to 104), but in some cases these are located at the sites of existing plants within which reactors may have been decommissioned since 1982. Some states that did not have nuclear power plants in 1982 now do (Arizona, Kansas, Louisiana, Mississippi, Missouri, New Hampshire, Texas, and Washington), and some other states that had an operating power plant pre-1982, now do not (Colorado, Maine, Oregon) (see Table 1.1, Chapter 1).

For the third change, the follow-up in the NCI study was through 1984 and included facilities that were in operation by 1982. There was very little follow-up time beyond a presumed minimum latency period of 10 years for most solid cancers. (Only with the passage of some years from the year that a facility started operation is it expected that populations living near the facility have accumulated sufficient exposure to develop cancers because of the releases from these facilities.) A current analysis of risks could add 25 or more years (1984-2009) of follow-up. However, an important limitation is the lack of mortality data at the census-tract level: Mortality data that could be readily geocoded to census tract (i.e., addresses are available electronically) do not exist for early years, although data summarized at the county level do exist (see discussion in Section 4.3.3). This recognized limitation of the census-tract-level ecologic design considered here is balanced with the possible gain in statistical power due to the more relevant geographic classification and follow-up period.

Many of the 117 plants that are examined in this study (currently operating and decommissioned; see Table 1.1, Chapter 1 for the list) began operations in the 1970s (45 percent) or early 1980s (37 percent), so if mortality data by census tract exist from the mid 1980s onward (with significant variation across states), some 25 years of follow-up would be possible (in some states follow-up would be much shorter, in some longer). Whereas a large fraction of the observation time in the NCI study predated a minimum latency period (of perhaps 10 years after exposure), most of the observation time in this study would occur after the minimum latency period has elapsed. As incidence data in only two states were examined in the NCI investigation (Connecticut and Iowa), the improvements in the incidence analysis are more clear. Moreover, as the year that mortality and incidence data in a state become available varies, the two approaches would provide complementary time coverage.

For the fourth change, the level of exposure of populations in specific locations around a nuclear facility is dependent on the magnitude of the releases from the facility, the distance of the population from the facility, the mix of wind directions and velocities, and variations in terrain (for gaseous releases), and the locations and directional flow of liquid releases. All these factors are incorporated in dosimetric models that could be used by epidemiologists to calculate cumulated exposure levels for any given geographic unit, such as a census tract within the 50-km radius from the facility, for each year and perform “dose-response”-type analyses of health endpoints. This would be a substantial improvement over most previous approaches, such as examining a 5-km radius around the facilities.

For the fifth change, an overall modeling framework for the analysis of the ecologic data is to develop an extended cross-classification table, each cell of which contains a count of the incident or fatal cases of interest, an estimate of the person-years at risk, and the appropriate estimated exposure quantity and values for other covariates of interest. The cross-classification would be according to geographic unit (for example, census tract, which itself implies the particular nuclear facility under study), calendar year, age, gender, and race/ethnicity. For example, cancer registration of a 50-year-old African American woman, diagnosed with breast cancer in 2005, living in census tract X at the time of diagnosis, would contribute a case count to the cell which records the number of African American women in tract who in 2005 were 50 years old. Census data would be used to estimate the total number of African American women aged 50 years who were living in census tract X in 2005 so that rates can be computed. Other variables available for this census tract at this time would include a calculated dose estimate or dose surrogate, as well as other census data, or data integrated from other sources with census data. These may include estimates of socio-economic conditions prevailing in census tract X in 2005 or at some other time, based on data about education, land use, and home ownership rates. Information about these and other variables may be important because they could act as confounders in the dose-response analysis. For example, breast cancer risk is influenced by factors such as age at first birth, hormonal use and other factors, all of which may depend to some degree on socioeconomic conditions. Poisson regression techniques (described in more detail in Appendix J) would relate the dose surrogates available to the rate of cancer seen in each census tract, after stratifying on race/ethnicity, age, and calendar year, and adjusting for socioeconomic or other variables available at the census-tract level.

As population distributions change with time, an ecologic study needs to account for such changes. In the 1990 NCI study, matching of exposed and unexposed counties was based on data for the years 1979 and 1980 (the latest years included in the analysis) and did not consider county characteristics in the 1950s and 1960s, which were likely different from those in 1979. An improvement over the 1990 NCI study would be to allow for differences in cancer rate (incidence or mortality) between geographic regions (census tracts) to depend upon distance or dose as well as time, while adjusting these for the changes in various socioeconomic variables and other risk factors.

In addition, dose surrogates will change over time depending on the total cumulative dose that someone living in a given census tract would receive, so that this dose surrogate increases in time as releases accumulate, and the dose surrogate level is specific for time, nuclear facility, census tract, and age (e.g., persons at age 10 in 1990 would not have been exposed to transient plant releases in the 1970s, whereas those at age 30 would have been). The flexible manner of dose assignment to specific cells in the projected analyses could take into account these variations. In census tracts judged to be stable demographically (with few people moving in or out) this could be the most relevant dose function. In other census tracts (with higher in-migration or turnover) early doses may be regarded as less relevant than later doses, and this could be taken into account in various ways.

As discussed in Section 4.2.1, dealing with the comparison issue and the expected false-positive findings is especially challenging in ecologic studies where each of the thousands of risk estimations is subject to statistical tests to assess whether any observed association occurred by chance or not. At the end, scientific judgment based on biological plausibility and current knowledge are needed to interpret the findings.

Investigators of the 1990 NCI study who based their analysis primarily on a pre- versus post-facility-operation comparison of risks in counties with or without a nuclear facility were able to interpret and communicate the appearance of false-positive findings rather effectively. Data were presented in support of the fact that many statistically “significant” increases in risk in relation to nuclear facilities were found for the period before facilities started operation; these risks could not possibly be attributed to releases from the facilities but are rather statistical effects (Jablon et al., 1990, 1991). The pre- versus postoperation analysis was possible using county-level data as they are available uniformly across the United States and are of good quality. However, reducing the geographic unit to be considerably smaller than a county, which is considered one of the most important ways to improve on the NCI study, comes with the trade-off that risks before the operation of the nuclear facilities can only be estimated for a small number of facilities. These are the facilities that are in states where long-standing cancer registration and mortality data with available information on geocoded address are available for many years. Cohort Studies

In a cohort study, a defined population is followed forward in time to examine the occurrence of many possible health outcomes. Cohort studies may be either prospective, focused on health outcomes occurring after the start of the study, or retrospective, using existing data in registries to construct a cohort and follow it forward to the present and sometimes beyond. Disease incidence in individuals who are “exposed” are compared to those who are “unexposed.”

Prospective Cohort Study

Prospective cohort studies in which participants are recruited, data on residence locations and various potential confounder variables are collected, and then participants are followed for incident disease occurrence are generally thought to provide the most reliable information about disease risk in relation to a risk factor. The major advantage is that the study can be carefully planned in advance to include such things as individual exposure assessment (e.g., using dosimeters) and other covariate data. Since the exposure data are measured before the cancer occurs, some kinds of biases are reduced or absent, so this cohort design is generally preferred over others for making causal inferences. However, prospectively followed cohorts must generally be observed over a very long time (decades) before enough cases of most diseases are available for statistical analysis. To give one example, atomic bombing survivors, exposed in 1945, were initially interviewed around 1950 and have been followed for mortality outcomes since that time and for incident cancer since 1958. It was not until the 1960s (about 15-20 years after the atomic bomb exposure) that the first statistically significant findings emerged of an increase in solid tumor mortality in exposed survivors (Socolow et al., 1963; Wanebo et al., 1968).

A cohort study of the future cancer outcome of individuals near nuclear facilities would involve enormous logistical problems in order to follow individuals for decades into the future. The study would not be able to evaluate past exposures, and this may be a serious problem because the highest radiation exposures may have been in the early years of the nuclear facilities’ operations. Far more individuals than are typically needed for a case-control study would have to be interviewed initially and then tracked in the future for cancer incidence and mortality. Population mobility would mean that such tracking would involve large-scale regional or countrywide efforts. Additionally, to follow a population for many decades in the future as needed in a prospective cohort study relies on long-term institutional commitment that may be difficult to sustain. However, prospective monitoring of populations living around nuclear facilities would provide more accurate estimates of ongoing exposures than those reconstructed retrospectively based on modeling of reported releases from the nuclear facilities. It would also provide data regarding the cancer risks associated with exposures in the future.

Retrospective Cohort Study

Retrospective cohort studies, when feasible, are more efficient than prospective studies because the follow-up period is in the past. A retrospective cohort study identifies a group of people at a time in the past for which exposure estimates exist or can be constructed, and follow-up extends from that time to the present. Such designs are commonly used in occupational epidemiology in which workers employed at a particular facility during specific time periods and meeting other inclusion requirements are followed forward to the present for disease incidence or mortality using existing mortality information or cancer registry information. A retrospective study requires that systematic exposure information at the beginning of and during the follow-up period be available from existing records. Exposure information that might be available from company employment records is related to disease or mortality using statistical methods appropriate for time of event analysis (often Cox regression). Other retrospective studies are based on the follow-up of defined birth cohorts and record linkages used to establish both follow-up and exposure. For example, a recent retrospective cohort study of childhood cancer in Switzerland linked birth records with cancer registration data across the country and used the birth and current residential records to determine proximity to nuclear power plants as a risk factor (Spycher et al., 2011).

The feasibility of a retrospective cohort study depends upon the ability to define a cohort that will include both exposed and unexposed individuals, to estimate appropriate exposure information passively (that is, without the aid of patient or family contact) from existing records, and to link, also passively, the cohort to cancer registration or mortality records from the time that an individual entered the cohort (e.g., time of birth for a birth cohort) until the end of follow-up.

The committee carefully considered the feasibility of a retrospective cohort study of cancer incidence in and around states with nuclear facilities. For the reasons outlined below, only studies of childhood cancers were considered for such a study.

  • Children and fetuses, due to their rapidly dividing cells during development, are typically more sensitive to environmental effects than adults.
  • Pediatric cancers have been the focus of many studies, some of which found a positive association between proximity to a nuclear facility and cancer risk. Leukemia is recognized to be the “sentinel indicator” for radiation effects, occurring with a shorter time latency following exposure than for solid tumors and with a clear dose-risk relationship (experience from atomic bombing survivors).
  • The minimum latency period for leukemia in children is lower compared to that in adults. Associations of childhood cancer risk and radiation releases from nuclear facilities, if any, are probably less affected by co-carcinogens compared to adults, where smoking, occupational exposure, and other established lifestyle risk factors play an important role. Nevertheless, there may be still some risk factors and potential confounders in the development of a cancer during early years of life that are presently unknown.
  • Mobility (in- and out-migration) of young populations is less frequent; therefore, observed associations of cancer risk with residence at birth and at diagnosis (often the basis for dose estimations) are more relevant compared to those in more mobile adult populations.
  • Children typically spend more time at place of residence compared to adults, whose work may take them elsewhere.
  • Societal concerns regarding the radiation health effects of children are the most frequently expressed.

Pediatric leukemia warrants particular attention in the analysis for the reasons summarized at the second bullet point. Similarly, brain cancer, which is the most common solid cancer in children, needs to be given particular attention. Radiation exposure is one of the few established risk factors for this disease. Although all pediatric cancer types can be examined individually, because of the rarity of cancers in children and expected loss in precision in risk estimation it may be needed to create case subgroups based on homogeneity of disease manifestation, etiology, or other categories.

The outlines of the study considered are as follows. All reports of childhood cancer in all available cancer registries over a fixed time period would be linked to birth records from states that contain nuclear facilities or are adjacent to nuclear facilities. Nearness to nuclear facilities (or doses from nuclear facilities estimated by the reported releases) at the time of birth would be established using the residential addresses recorded in the birth records. The entire birth cohort would be linked to all cancer registries, not only in the state of residence at time of birth, but also to other state registries, to capture the mobility of the population. Ideally, changes in residence (and hence changes in potential exposures) would be obtained by linkage to databases providing address histories. Dose surrogates would be constructed starting from the time of birth according to residential location. These dose surrogates and cancer incidence data would be analyzed to investigate whether residence patterns that indicated a potential for higher exposure are associated with increased rates of childhood cancers.

Although simple to describe, there are many practical difficulties with performing such a study in the United States. These include:


Low coverage of cancer registration before about 1992 for most states.


The size of the birth cohort required to have adequate power.


Lack of information concerning residence changes following birth.


Administrative difficulties accessing state birth records databases and cancer records.

For more details regarding the first difficulty, see Section 4.3.2.

Regarding the second difficulty, Figure 4.3 and Table 4.3 indicate that for a cohort study with a large fraction of unexposed subjects it would take about 1,800 cases in order to have good power to detect a 40 percent excess cancer risk (RR = 1.4) and would require approximately 4 years of incidence data. For example, if all childhood cancers among children aged 0-14 diagnosed in the 4-year time period 2006-2009 were to be targeted in the study (a time when almost all states have working cancer registries), then this would involve linking 18 years of birth records (all children born between 1992 and 2009) to some or all of the cancer registry cases. If we assume that approximately one-fifth of the 4 million births taking place each year in the United States are to women who have home residences within 50 km of a nuclear facility, then this would mean that approximately 14 million birth records would need to be accessible.

For the third difficulty, while there are many ways to try to trace people as they change residences (see Section 4.3.5), no comprehensive databases are available, and ad hoc searching for residence changes on a cohortwide basis (for millions of birth records in numerous states with disparate sources of residential information) appears on its face to be prohibitively impractical. This means that the only consistently available dosimetry information would be for the period at time of birth. After that, residential changes would gradually degrade the applicability of individual exposure information, such as estimates of cumulative dose. If one assumed that all individuals remain in the same residence as at birth, then cumulative dose calculations are easy to perform, but developing a more realistic model for the accumulation of dose would involve population-based estimates of the probability of mobility. This may lead to some minor improvements in dose estimation, but the fundamental problem, that it is impossible to trace large numbers of individuals from residence to residence, remains. Despite the inadequacies in the use of birth place as the point of exposure over the follow-up period of interest, it is widely thought that children are the most sensitive to dose received in early childhood or in utero (Pierce et al., 1996), so birthplace may be a more relevant dose surrogate than would be residence at time of diagnosis, as discussed, for example, in the ecologic study. As birth place is defined by maternal residence at time of delivery of the index child it can be used as the point of in utero exposures as well as early life exposures. The mobility of the population during pregnancy remains an issue (Fell et al., 2004).

For the fourth difficulty, birth records and cancer registries are typically managed within each state. However, as shown in Figure 4.4d, many nuclear facilities in the United States are located near state boundaries, and populations of interest often reside in more than one state. In addition, the mobility of the population in the United States may also necessitate linkage of registry data across additional states. While not impossible, access to records will require approval from all states involved, creating a logistical barrier to implementation.

FIGURE 4.4d. Exposed population from a nuclear power plant crossing state boundaries.


Exposed population from a nuclear power plant crossing state boundaries.

Going further, although linking birth record data across states may be technically possible, there are anticipated difficulties due to the differences of state statutes governing cancer and birth registration, support to research activities, and concerns about privacy following release of information. All these could decrease the quality of the linkages, lead to failure of linking data across states, and delay completion of the study.

The retrospective birth cohort study is judged by the committee to have high scientific merit. However, there are some feasibility concerns at a nationwide scale. A modification of the retrospective cohort study that may be more efficient would be to conduct a record-linkage-based case-control study that is nested in a restricted retrospective cohort study. Population-Based Case-Control Studies

A case-control approach may be appropriate if efforts are directed to selecting just one or two major diseases that may appear in populations around nuclear sites or are restricted to a specific age group. For example, it may be relevant to focus efforts on studying the risks associated with pediatric cancers developing in young residents close to nuclear facilities or more specifically look at risk factors involved in childhood leukemia developing in this group. The German KiKK study and some other studies have suggested a possible increase of this type of childhood cancer, though many other studies have not replicated this observation (see Section A.4.1 in Appendix A for literature review).

Case-control studies using incident (newly diagnosed) cancer cases with data from several registries must consider the years in which registry data are available; the period of inclusion of the cases and controls can be defined once the quality of cancer registration is found to be adequate. Moreover, a case-control study that requires contact with the study participants that is restricted to recent cases (e.g., those diagnosed within the past 5 years) minimizes potential selection biases due to differential disease severity or availability for interview and/or data collection for nonsurvivors.

In a case-control study, cases are generally matched to appropriate controls either individually or according to a categorization of variables (often age, gender, race/ethnicity; this is known as frequency matching). In either individual or frequency-matched studies investigators need to determine the ratio of the number of controls to the number of cases, a decision generally driven by calculations of statistical power, and the number of cases expected. For rare cancers such as childhood leukemia, the observed number of cases will be relatively small, and multiple controls (two to five per case) would help to improve the precision of results. However, the improvements diminish rapidly as the number of controls per case increases, and more than five controls per case is not likely to be helpful (see Figure 4.3). It is critical that the number and nature of matching criteria be considered carefully. Overmatching must be avoided; for example, matching closely on place of residence or distance from a nuclear facility may constitute overmatching. That is, investigators “force” the cases and controls to be too similar in the exposure under investigation; therefore, the effect of the exposure on disease cannot be investigated.

Obtaining accurate information on past exposures (predating the occurrence of the cancer, or an equivalent time point for controls) can be problematic. If information is to be obtained from existing records, it may be only partly suited to the desired study information. For example, data on smoking might be obtained from employment health records, but the smoking information may be incomplete or too cursory for the need (e.g., “Do you smoke?” rather than detailed information on duration and frequency of smoking, and information may vary across time periods and employers). Records relevant to some exposures would have been generated for administrative rather than medical purposes and therefore might be poor surrogates for the desired information.

The information for cases and controls must be collected by the same approach in order to limit bias related to quality of information or extent of detail of the data collected in different administrative files or medical records, or due to differential interviewing. Residential history, socioeconomic characteristics of the parents, infections, exposure to radiation in utero or as a child, and parental smoking are some of the factors previously associated with childhood leukemia and such information, if available, can be included. Birth order is of interest because it has been implicated as a risk factor for leukemia and may be a marker of exposure to infectious agents, with later-born children presumed to be exposed more often and at earlier ages from their older siblings. Therefore, birth order could be used as a proxy to examine the postulated population mixing hypothesis and infectious etiology for childhood leukemia (Kinlen, 1988). According to this hypothesis, childhood leukemia is a rare response to common infection, which may be introduced to a previously isolated rural community by sudden in-migration and changes in the dynamics of infectious diseases.

Record-Based Case-Control Study

As stated earlier, the retrospective birth cohort study was judged by the committee to have high scientific merit but involves logistical and administrative barriers. A record-linkage-based case-control study that uses data on cancer registration and birth records to identify cases and controls and relevant information is an alternative to the retrospective birth cohort design.

In a record-linkage-based case-control study, children diagnosed with cancer at age 0-14 years are identified from population-based cancer registries of states that have or have had a nuclear facility or are adjacent to such a facility. Cancer cases identified among children in the registry are linked to birth records within the respective state(s). Those born within the area of interest (e.g., 50 km around a nuclear facility) are eligible cases. One or more controls are randomly selected from birth records restricted to those born within the 50-km zone from the facilities, with matching to cases on year of birth at minimum, and if possible month of birth, race/ethnicity, and gender. The 50-km zone provides a wide range of potential exposures for controls but keeps controls in similar regional settings. Children diagnosed with cancers but who were born outside the study area could be excluded from the control group; however, the likelihood of them being selected randomly as controls is very small as indicated below.

The record-linkage-based case-control study of pediatric cancers differs from the retrospective cohort in some important issues that enhance its feasibility by:


Restricting the linkages to within state instead of across states. Rather than considering (for example) all of the 3,000 childhood leukemia cases per year that are expected nationwide for linkage to birth registry information for all states with or proximal to nuclear facilities, cases would be identified from state cancer registries with or near facilities, and linkages would occur only within the respective states as opposed to between states. This should reduce considerably the number of birth records that need to be searched for each cancer case included. Also, as a consequence of restricting the cases to those born and diagnosed in the same state, the record-linkage-based case-control study focuses on the more residentially stable children (although arguably the children and their families may have moved within the state in which the child was born).


Limiting the number of cases and controls that would be followed to update residential history, or dropping the requirement. As a relatively small number of controls for each study case would be selected for analysis along with the cases (since many fewer study subjects would be involved than in the retrospective cohort study) it may potentially be more feasible to follow these forward and retrieve residential information than it would be to follow an entire birth cohort forward to look at changes in residence, in order to refine dose estimates. This effort still, however, could be substantial and may be worth doing only for a relatively small number of cases and controls in order to give estimates of overall rates of out-migration and loss to follow-up. Dropping the requirement of following the subjects forward in time via records, the overall efforts required to conduct the study are substantially reduced.

As with the retrospective cohort design, cases as well as controls are required to be born within a fixed region (e.g., 50 km from a nuclear facility). For the record-linkage-based case-control design more selective targeting schemes could be considered, such as requiring the cases selected for study to be residents of a 50-km proximity zone at the time of diagnosis. It must be kept in mind, however, that as further restrictions for selecting eligible cases apply, the potential for loss of study power increases if large numbers of cases were excluded from consideration. Additionally, as the design does not rely on follow-up of the controls to establish if they also remained at the 50-km zone from birth to the time that the cases were diagnosed, the potential for selection bias increases and false relationships between case status and distance could appear if the probability of moving versus staying within the same region is inhomogeneous with respect to distance from nearest nuclear facility. Results from regions with high in- or out-migration of children would be less reliable than those from regions with less population mobility.

The design could be extended as far back as registries with good quality data exist and birth years of cases and controls would co-extend with good practices of registry operation. A study that includes subjects that were born before the state’s cancer registration is of acceptable quality could appreciably increase the number of eligible cases at the older targeted ages, and it also could assess exposures in earlier years when the exposure levels were likely higher. Inclusion of these subjects can be achieved as follows: For cancer cases at each age X, the birth records for up to Y years before the beginning of good quality cancer registration could be used. For instance, if the year of good quality cancer registration data is 1996, the birth records from 1990, 1991, or 1992 could be used to include cancer cases and controls of ages 6 or older, 5 or older, 4 or older, respectively. While this approach might introduce slight bias as those who developed cancer at earlier ages would not be eligible, for all practical purposes the study could be regarded as unbiased on that respect.

An advantage of either the record-linkage-based case-control approach or the retrospective cohort study is that certain relevant characteristics of the parents and infant are available on birth records and, depending on the year and state, would include: mother’s address; duration of residency at that address, parental age, race/ethnicity, educational level; and date of birth, gender, weight, and order of birth of the index child. Additional information on the birth certificate such as substance abuse by the mother (including smoking and alcohol) does exist in certain cases but will have varying reliability and completeness depending on the state (Spector et al., 2007). The above-mentioned data elements are included on the 2003 national standard certificate of live birth; however, the certificate was not implemented systematically. As described elsewhere, 2 states implemented use of the certificate in 2003, 7 additional states in 2004, and cumulatively 15 states used it in 2005 (Kirby and Salihu, 2006). Information on abnormal conditions of the infant such as Down’s syndrome and other congenital anomalies of the newborn can be used to exclude cases and controls from subanalysis.

Regarding these issues, in a five-state pooled analysis study of parental age (available from birth records) and risk of childhood cancer (Johnson et al., 2009) which used the methodology described here, diagnoses went back to 1980 in Washington State, 1985 in New York State, 1988 in Minnesota and California and 1990 in Texas. The analysis from five states comprised approximately 30 percent of the U.S. pediatric population. Using probabilistic record linkage, the linkage success of cancer registry and birth records data within a state was 88 percent for leukemia cases age <5 years in California (Reynolds et al., 2002), 87 percent for hepatoblastoma cases age <5 years in New York (McLaughlin et al., 2006), and 82 percent for cancer cases age <15 years in Minnesota (Puumala et al., 2008). The information was not reported for Washington (Podvin et al., 2006) or Texas (Walker et al., 2007). Although the authors did not provide a breakdown of the possible reasons for unsuccessful linkage, these may include inmigration (children born elsewhere moved to the reference state and were diagnosed there), rather than flaws in the linking methodology.

A 17-county study of childhood leukemia (age <15 years) in California demonstrated that a small percentage of cases (12 percent) were not born in the study area; approximately 5 percent were born in other counties in California and 7 percent outside of California (Ma et al., 2004). The recent study in Switzerland, a country where populations are likely less mobile than in the United States, demonstrated that 68 percent of pediatric cases had not moved between birth and diagnosis, 22 percent had moved once, 6 percent three times, and 4 percent three times or more. Although in-migration is expected in all states under study and appears to be somewhere between 10 and 20 percent for children 0-14 years, it is expected to be lower for children 0-5 years old (Ma et al., 2004), which is also the age range in which most leukemia cases are expected (peak for acute lympho-blastic leukemia is 2-4 years old).

It may be possible to estimate in- and out-migration of subjects based on census data and to describe the characteristics of the cases who migrate based on cancer registry data such as age, year of diagnosis, and race; correction for selection bias may be possible if probabilities of exposure can be stratified by these same variables.

Study controls in the record-linkage-based case-control design are randomly selected from each state’s birth registry. The matching ratios for the pooled analysis of the five states mentioned above differed by state from 1:1 to 1:10 (Johnson et al., 2009). A concern is that children identified by the birth registries as eligible controls may have been diagnosed with cancer in a different state. However, given the rarity of childhood cancers (about 4.8 per 100,000 children will be diagnosed by age 15 with leukemia or brain cancers, the two most common cancers in children), this issue should have essentially no effect on the power of a study, but might nevertheless have some unknown potential to introduce bias, since controls but not cases may have migrated from the state and such migration might reflect socioeco-nomic or other differences that affect childhood cancer risk.

Feasibility of the record-linkage-based case-control study depends on availability and release criteria of the information on both birth and cancer registration information that may involve demanding Institutional Review Board (IRB)13 or equivalent body approvals. Release of the required information may not be possible in all states under investigation, or in rural areas within the states for reasons of subject protection or because linkage capabilities are not in place. For these reasons, it may not be possible to include all of the states of interest in the analysis.

Part of the predicted feasibility and practicality of this study lies in the fact that it can be based on and expand on existing studies and ongoing efforts to link state cancer registry records with birth records, by partnering with the appropriate investigators. Such linkages are established statewide within Washington, New York, Minnesota, California, and Texas. Similar linkage analyses have been performed in metropolitan regions and surrounding counties of Seattle, Washington; Detroit, Michigan; and Atlanta, Georgia, as well as statewide in Utah (Mueller et al., 2009), to investigate pregnancy outcomes in female childhood and adolescent cancer survivors.

De Novo Case-Control Study with Patient or Family Contact

The committee also considered the development of a new case-control study. To illustrate, a study of childhood cancer might begin with definition of a reference population of children less than 15 years old, living in the vicinity of nuclear facilities. Controls would be children of the same age and gender who lived in the same general area with the cases at the time the cases were diagnosed. Contact with children or families would be used to define residential history and therefore the study is not dependent on assumptions about continued nearby residence from birth until time of diagnosis.

The challenges of selecting appropriate controls through random-digit dialling, school records, or friend controls and the emerging use of birth record controls are discussed in Section 4.3.4. It is important that controls be selected in a way that does not bias the basic comparisons that are the object of the study. In particular, controls must represent the distribution of distances from the nearest nuclear facility for the same population from which the cases are being drawn.

Within a case-control study, investigators would usually choose the recent cases (for example, those diagnosed during the period 2005-2010) and appropriate controls and trace individuals for interviews in order to collect information on residential history and other risk factors and refine the exposure of the individuals. Tracing recent cases tends to be more successful than tracing past cases as the more recent cases would have less opportunity to move, would be easier to find, and are more likely to be alive. Children with cancer would be traced through the treating institution as identified from cancer registration files or other means and they and/or their parents contacted in order to obtain additional information regarding residential history and a list of known or putative risk factors for childhood cancer. If the identified cases who were children at diagnosis and are adults at the time of interview are those providing the information, their responses may differ from those of the parent, and many now-adults may not know answers to questions about childhood residential history or early life care. (Cancer registries may require that contact with the now-adult is established first to obtain permission to be a study subject and to allow parental contact.) Depending on the method selected for control identification, tracing for controls may also be required (see Section 4.3.5).

Even when tracing is successful, collection of detailed information by interviews or by questionnaires will face issues of nonparticipation. As nonparticipation rates are often considered an indicator of the potential for selection bias, it is important that they are kept as low as possible; individuals (or parents) who refuse to participate in the study may differ in relevant ways from those who are willing to participate, and this may affect the study outcome. Controls often are more likely not to participate than cases, and participation rates of controls have declined in recent years, regardless of source (Bunin et al., 2007). One survey estimated the decline of population-based controls to be –1.86 percent per year (Morton et al., 2006). Low participation rates or differential participation rates between cases and controls can introduce bias, when willingness to participate is related to exposure and this tendency is stronger (or weaker) in cases than in controls (Hartge, 2006).

Differences in the accuracy and detail of answers provided need to be minimized. Focus groups and pretests of questionnaires and procedures may help to establish a well-designed questionnaire for the specific study scope. To avoid bias associated with information given during an interview or when filling out a questionnaire, one useful approach is to not inform interviewers whether a specific subject is a case or a control; this can limit the bias that an interviewer might unconsciously inject into the information, though information on case or control status may often come out during the interview. In contrast, a patient (or proxy) cannot be kept in ignorance of his or her status, so an additional concern is “recall bias,” under which controls may have given less thought or pay less attention to past exposures (such as infections, medical imaging, and other) and underreport them, thus introducing a bias. For example, a mother whose child has died of leukemia may be more likely than the mother of a healthy living child to provide more complete and accurate information on past experiences such as x-ray exposures when the child was in utero (see Section A.4.6.2, Appendix A). This recall bias could artificially suggest a relation between x rays and leukemia.

Moreover, the information that individuals give may be affected by unconscious biases; this is particularly true if a study has been widely publicized and subjects are aware of reported health effects and what exposures are suspected to cause these effects. A well-designed questionnaire may minimize these biases by carefully wording the questions, often requesting the same information by two questions phrased differently to identify inconsistencies and judge the reliability of the information, or simply by forcing the individual to think more carefully. Telephone interviewing may be a better approach than interviews in person, especially when questions touch on sensitive matters such as possible exposures during pregnancy.

In a study of childhood leukemia the questionnaire is likely to contain details on lifestyle, socioeconomic status, residential history, occupational exposure of parents at the time of conception of the child and during pregnancy, medical radiation exposure during pregnancy and early childhood, infectious diseases during early childhood, contact with other children during first years of life, nursery care, birth order, and number of children in the family as well as questions specific to milk consumption to better estimate individual exposure. As most risk factors for leukemia are still unknown, it may be necessary to consider trade-offs between collecting a large amount of information per subject and the number and geographic source of subjects. Experience from previous studies in similar populations and areas often provides useful lessons learned.

As shown in Section a study which would have good power to detect 20 percent increases in cancer risk for a relatively rare exposure (RR = 1.2, assuming 2.5 percent of subjects are exposed in the calculations in Figure 4.3) would have to be extremely large (thousands of cases and at least as many controls). For rare cancers (such as childhood leukemia) this would involve decades of accrual in regions near sites; while much larger relative risks could be detected far more easily, the expectation is that 20 percent increases are extremely large relative to the cancer risks expected based on reported releases. For more common cancers, while the rates of case accrual are larger, the expectation is for even weaker dose-response relationships. Thus, the power of any feasible case-control study (one that could be completed in years rather than decades) is likely to be extremely low.

For reasons primarily related to considerations of both statistical power and logistics, combined with the fact that only relatively recently diagnosed cases could be included and the potential for participation (and possible information) bias, a de novo case-control study and the associated efforts required to collect additional information on potentially confounding factors may not be justified over the record-linkage-based case-control approach.

Building on Existing Studies

As discussed earlier in this section, it may be possible to partner with investigators who are already using linkages between cancer registration and birth records to perform the record-linkage-based case-control study. As these linkages exist in at least six states, representing more than 30 percent of the U.S. pediatric population, using existing data, if possible, would reduce substantially the overall efforts required to conduct the record-linkage-based case-control study.

Several recent or ongoing case-control studies, cohort studies, and clinical trials could be useful in developing a new case-control study with contact of individuals or their proxies. The advantage of working with existing studies is that cancer cases and controls have already been identified, the initial contact has been established, and collected information related to the original study may be useful. Participants or their proxies can be recontacted and additional relevant information can be requested such as residential history and potential confounders. In certain instances it may be possible to find existing data about residential history passively (from old city directories, for example), without individual participant contact. Here, however, we assume that (as for most studies) individual exposure and covariate data are obtained directly from participants or their families. The requirement for direct contact would seem to require that the existing study contains recently diagnosed cases and that patients or families be contacted soon after diagnosis. This limits the number of existing studies that would be useful as partners.

Most existing large studies are focused on adults, and often for populations with specific characteristics and outcomes to serve the specific research focus of the study. A few such examples are the Women’s Health Initiative, a study of more than 160,000 generally healthy postmenopausal women, designed to test—among other issues—the effects of postmenopausal hormone therapy on breast and colorectal cancer (Hays et al., 2003), and the Nurses’ Health Study, a study of about 238,000 female nurses, focused primarily on cancer prevention (Willett et al., 1987). For rare cancers such as pediatric cancers, investigators have realized that individual large cohort studies are unable to examine the effect of different exposures on the disease due to inadequate sample size. For that reason, multiple large children’s cohorts have joined to establish national or international consortia such as the Pediatric Brain Tumor Consortium and the International Childhood Cancer Cohort Consortium.

Even if existing studies include the age group and cancer outcome of interest, the biggest issue is that, since only a relatively small fraction of the U.S. population overall lives quite near a nuclear facility (about 0.3 percent within 8 km and 15 percent within 50 km in 2010; see Tables 1.3 and 1.4 in Chapter 1), existing studies probably do not cover enough persons living within the 0-50-km zone to provide statistical power for the study of the relation between residential history and/or individually estimated exposures and cancer occurrence. The possibility of using an existing study to build a contact-based case-control study was not considered further, since no known studies that would meet the necessary criteria were identified.

4.2.3. Recommended Studies

Of the several studies considered, two epidemiologic study designs were judged by the committee as suitable to have scientific merit and address the nonscientific issues that they must deal with for assessing cancer risks in populations near nuclear facilities: the ecologic and record-linkage-based case-control studies. A summary of the strengths and limitations of the recommended studies is presented here. Summary of Strengths and Limitations

1. Ecologic study


The study design investigates incidence and mortality rates for all common cancers identified at the census tract within which cases reside at the time of diagnosis or death from cancer, respectively. The study is restricted to census tracts within a fixed distance (perhaps 50 km) of a facility which represents a range of potential exposures from the highest to essentially no exposure. Cancer rates among census tracts are compared by average estimated levels of exposure.

The question such a study can answer

Are observed cancer incidence and/or mortality rates higher in census tracts with higher estimated exposures (as estimated from reported releases from the nuclear facility)?

Feasibility14 depends on


Availability and release of aggregated cancer registry and mortality information at the census-tract level, according to age, gender, race/ethnicity, and cancer site.


Availability of population structure and size (also by age, gender, race/ethnicity) data from the U.S. census, with interpolation for noncensus years.



Has the ability to look at all potentially radiosensitive types of cancers and for all age groups.


Examines both incidence and mortality, which provide complementary data and can be mutually supportive.


Can examine past outcomes and therefore can examine risks at times when releases were higher and more likely to cause cancer.


Only cancer registries and/or vital statistics offices of those states that have or have had a nuclear facility or which contain populations within the study distance of a nuclear facility need to be contacted.


Provides results relatively quickly as information comes mostly from existing databases.


No issues related to control selection appropriateness or feasibility.


Does not rely on recruitment of study participants.


IRB or equivalent body approvals for cancer incidence and mortality data will possibly be needed, but procedures are likely to be undemanding (possible exceptions are procedures for data release from rural areas where only a few cases reside within a census tract).



Subject to ecologic fallacy and has limited ability to conclusively establish or refute a relationship between radiation and cancer because exposure information on actual cancer cases is not obtained; might be subject to biases that cannot be taken into account. Is considered hypothesis generating.


Study type has been criticized. It may be viewed as an easy, quick, and least expensive study, bound to give inconclusive results because:

  • It is particularly subject to multiple comparison problems as numerous cancer types and age groups will be examined.
  • It can control for confounding only by using aggregate census-tract data. The registry and census data do not include specific lifestyle factors.

Can only examine associations based on residence at diagnosis or death rather than place of birth or place of relevant exposure. Associations based on place of death may only partially reflect past exposures due to population mobility.


Can only estimate average in- and out-migration rates, with no information on the residential history of actual cancer cases.

2. Record-linkage-based case-control


Children diagnosed with cancer (in the period of reliable cancer registration) in states that have or have had a nuclear facility or are within a fixed distance (for example, 50 km) of a nuclear facility are linked to the birth records of the respective states to identify those children that developed cancer and were born within a fixed distance from the facility (for example 50 km). Controls are children identified from birth records to be born in the same general study area as cases and matched at minimum to cases on year of birth (birth month and gender where possible).

The question such a study can answer

Among children born within 50 km of a nuclear facility, are pediatric cancers associated with higher exposure at maternal residence at time of birth?

Feasibility depends on


Availability of maternal residence at the time of delivery in the birth records.


Within-state linkage capability of cancer registration with records kept in vital statistics offices that will provide information on births (and possibly deaths) in the areas around the facilities.


Availability and release of linked data at the individual level.


Accrual of enough childhood cases during the times in which cancer registries are of reasonable quality to have power to detect disease patterns related to estimated exposure levels.


Ability to obtain birth record information on all births in the relevant risk sets (e.g., all those born within 50 km of the nuclear facility in each of the relevant birth years) in order to define an unbiased set of geographic controls.



Provides individual risk estimates rather than estimates based on geographic units.


Examines associations relevant to early life exposures (birth place) which can be considered more relevant than those later in life as would be captured in a study based on place of residence at time of cancer diagnosis or death from cancer and the equivalent for the unexposed.


Can be considered an objective study as it does not rely on contact of individuals or interviews and therefore is not subject to selection or possible information bias related with subject participation and collection of information on risk factors.


Does not need to be restricted to very recent cases, as cases and controls are not traced to be interviewed.


Provides results relatively quickly as information comes from existing databases and requires linkage only between cancer and birth registration data.


Information on certain relevant covariates is available in the birth certificates and can be adjusted for.


Because the study is focused on children, uncertainties sourcing from population mobility or lifestyle choices are less of a concern.


In-migration of cancer cases (but not controls) can be estimated.



Restricted to a specific age group and few cancer types (i.e., childhood cancers). Hence, it may not address many of the concerns of the public stakeholders.


Restricted to recent cases, therefore

  • Harder to accrue large numbers of cases (and hence statistical power may be limited).
  • Risks associated with higher releases in the past cannot be examined.

Cannot estimate the frequency of, or the altered exposures and effect estimates due to, out-of-state migration of cases or any migration of controls.


Linkage of birth and cancer registry records may not be possible (or permitted) in some states.


IRB or equivalent body approvals for data release of birth and cancer registration will be required. Approaches for Conducting the Recommended Studies

The recommended studies are complementary in that each addresses different aspects of cancer risks:

  • The ecologic study would provide an assessment of risks for a variety of cancer types over longer operational histories of nuclear facilities for which effluent release and cancer mortality and incidence data are available.
  • The record-linkage-based case-control study would provide an assessment of cancer risks for childhood exposures to radiation during more recent operating histories of nuclear facilities.

The recommended studies are mutually independent, and could be carried out individually or together. The decision on which of the recommended studies to carry out and their order of execution involves a host of policy and other considerations that are beyond the scope of this Phase 1 project. These include, for example, considerations such as the following:

  • Which age groups and cancer types are most important to address in the epidemiologic study or studies?
  • How much time is available to carry out the study or studies?
  • How much funding is available to carry out the study or studies?
  • Which public concerns are most in need of help with addressing?


4.3.1. Population Data

Each of the approaches considered requires some knowledge about the size and demographic characteristics of populations living close to a nuclear facility, and this information must be on a suitable time scale. The committee is convinced that the information should be for geographic areas smaller, perhaps much smaller, than counties.

Population counts for small areas are available from the U.S. Census.15 Every 10 years, in years ending in “0,” the Bureau performs the official count of people living in the United States. The Bureau of the Census supplements the decennial census on a continuing basis by the sample surveys and statistical models that make up the American Community Survey (ACS16), which provides more data on social and economic characteristics than does the decennial census. The ACS sends surveys to approximately 3 million housing units and group quarters in the United States in every county, so detailed information on a small geographic scale may be sparse. In 2009, completed ACS interviews represented 66.2 percent of the housing units initially selected for inclusion in the sample.

The decennial census reports show aggregate population demographic data for a standard set of geographic regions defined by state, county, census tract, block group, and block. Blocks are small geographic areas bounded by visible features such as streets and railroad tracks and by nonvisible boundaries such as property lines or county boundaries. Block groups consist of collections of blocks and are typically defined to contain 600 to 3,000 people. Census tracts contain several block groups and typically contain 1,200 to 8,000 people (with a target of 4,000 people) (www.census.gov). While the typical and target population sizes generally hold, there is wide variation across the country and some tracts contain population counts well below or above the example ranges stated here. The spatial size of the census tract also varies widely across the country. Census tracts were not fully defined until the 1980 Census. The 1970 Census had tracts for some areas, but not the entire country. Enumeration units at one level do not cross those at higher levels so, for instance, a census-tract boundary does not cross a county boundary. This nested hierarchy ensures that counts are “upward compatible.” County boundaries rarely change over time, and state boundaries do not change at all. If an analysis requires attention to these changes, the Geography Division of the Bureau of the Census may be able to help.

Census Summary File data from each household include information regarding the population (such as gender, age, self-reported race and ethnicity, household relationships). Questions about race and ethnicity have evolved rapidly and substantially over recent censuses, so comparability across time may be an issue. The 2000 census tabulates 171 population items and 56 housing items at the block level and an additional 59 population items at the census-tract level. At various times the data available at the census-tract level have included race-specific tabulations of other variables such as counts of age by gender by race, and household characteristics by race. From a one-in-six sample weighted to represent the county’s population, to which the “long form” was distributed until the 2000 census, more detailed population data exist, including, for example, place of birth, education, employment status, commuting distance to work, school enrollment, and income as well as housing data such as value of housing unit, telephone service, plumbing, vehicles available, and year structure built. The unpopularity of the “long form” led to its replacement by the ACS (www.census.gov/acs).

The ACS began collecting data in four test counties in 1995. National data were first released in 2001 (with data for 2000) and the ACS was fully implemented by 2006. Each year it publishes three sets of estimates: estimates based on the most recent 1 year of survey data for geographic areas of 65,000 and larger, 3-year average estimates for geographic areas of 20,000+, and 5-year estimates for all geographic areas down to the block group.17 ACS data are summarized for 5 years (for example, 2005-2009). The ACS has a rather short history but might be combined with data from the “long form” to provide useful information for long-term studies of health risks.

While the state-county-tract-block group-block hierarchy defines the primary framework for U.S. Census geography and aggregate data releases, data are aggregated in a variety of other ways. These include congressional districts and school districts, which need not follow block, block group, tract, or county boundaries. The U.S. Postal Service (USPS) defines ZIP code units for mailing addresses. ZIP codes are designed primarily to serve the needs of the USPS in management tasks related to local post offices. Some records (such as billing records and birth certificates) can easily be aggregated by ZIP code. While geographic areas are associated with ZIP codes, these areas rarely match block, block-group, or census-tract boundaries and, at times, even cross county and state boundaries. Compared to census tracts, ZIP codes are not only typically larger but also less homogenous aggregate units. In addition, ZIP code areas are modified as needed by the USPS, unlike census regions, which are updated only following a decennial census, to address in- and out-migration. As a result, direct linkage between ZIP code areas and census summary data is challenging, especially over long periods of time. As a compromise, the Bureau of the Census provides ZIP Code Tabulation Areas with summary data from block units combined to match ZIP code areas as closely as possible.

For both the census and the ACS, a number is not published when the number of persons in a cell of a table is small (often five or fewer), as a way to maintain the confidentiality of individually reported data. This can be a serious limitation in using the ACS but may be less serious for analyses based on the decennial census.

It is recommended by the Census Bureau that ACS data not be used below the census-tract level because the margins of errors on block-group estimates are generally high. These data are made available primarily to allow users to add block groups to create estimates for custom geographies.

Accounting for migration is important in studying the risks of living near a nuclear facility, but it is also challenging, particularly when smaller geographic units are analyzed. The decennial census and the ACS track migration, but in different ways. ACS asks individuals where they lived a year earlier and monitors place-of-residence changes if across county or state boundaries, but not smaller geographic units. If a person has moved multiple times within a year, the ACS captures only the earliest move in the prior 12 months. The decennial census has tracked migration by asking the individuals where they lived 5 years earlier. The 2010 Census did not collect information on migration.

Migration statistics from the Bureau of the Census are tracked every 10 years; this implies that any trends within the 10-year period are not captured. Models for migration into regions can be incorporated; for example, if it is known that a given locality has had much recent migration this can be used to modify (down-weight) the dose-surrogate variable under an assumption that migrants are unexposed prior to their move, thus reducing the average time-weighted dose value for that unit. Generally this would be done in a time and possibly age-dependent fashion allowing for migration patterns to vary over time and by age.

Pretabulated data are available for all levels of geographic units and would cover 100 percent of available data. The microdata file that is available for public use includes 40 percent of the data for geographic units that include at least 100,000 persons. For non-Census employees, gaining full access to the microdata is possible in special cases but requires substantial paperwork, including permissions and background checks, and the investigator would need to work in or with a designated Research Data Center to retrieve the information.

To appreciate the size of the populations residing near the nuclear facilities, the committee estimated the number of individuals that reside within the census tracts at 0-8- and 0-50-km radii around currently operating nuclear facilities. The numbers are presented in Tables 1.3 and 1.4 of Chapter 1. For demonstration, the 2010 census data were used, although it is clear that recent census data may not be relevant to risks associated with early operations of facilities. The committee used the geographic information system ArcGIS to draw circles around the facilities at 8 and 50 km. As the radius around a plant would cut through census tracts, the map assigned a share of each census tract’s population to the circle based on the percentage of the tract’s land area that falls within the circle. If the circle would intersect, for example, 30 percent with a census tract, then 30 percent of the census-tract population would be included in the circle; this assumes homogeneity in population density within the census tract. Such population size estimates are attractive and appear very precise, but they can be sensitive to the choice of map projection (Figures 4.4a-4.4d are based on a conic Lambert projection) and to the assumption that the proportion of area is an accurate reflection of the proportion of individuals residing in a portion of a census tract. In some cases, small changes in these two issues (map projection and proportional-to-area assignment) can result in changes in population estimates in the hundreds or even thousands of individuals.

FIGURE 4.4a. Size differences in the populations near nuclear facilities.


Size differences in the populations near nuclear facilities.

In summary, in 2010, approximately 47 million people (15 percent of the population in the United States) lived within 50 km of an operating nuclear facility and 1 million (0.3 percent of the population in the United States) lived within 8 km of an operating nuclear facility. The series of regional maps (Figures 4.4a-4.4d) highlight different challenges that need to be considered when evaluating the risks of the populations around the nuclear facilities and these are discussed here.

The population size residing near (e.g., within 50 km of) a nuclear facility varies considerably across the facilities. As an example, approximately 2,400,000 people live within 50 km of the San Onofre Nuclear Generating Station located in the San Diego County, California, indicated by the red circle, while only 54,000 people live within 50 km of the Cooper Nuclear Station located in Nemaha County, Nebraska. This can be visualized in Figure 4.4a, by the much smaller but denser (darker brown) census tracts that are around the San Onofre plant compared to the Cooper plant. Inner black circles indicate the boundary of the 8-km radius.

There is often overlap in the populations that reside within the 50-km radius from two or more nuclear facilities due to the proximity of the sites in some areas of the country. For example, approximately 143,000 residents of Illinois reside within the intersection of the 50-km radii of Dresden, LaSalle, and Braidwood plants combined (Figure 4.4b); in an epidemio-logic investigation of cancer risks, these residents would be considered to be exposed from all three plants and doses would be estimated using an additive model.

FIGURE 4.4b. Population overlap among nuclear power plants.


Population overlap among nuclear power plants.

Exposure estimations may be further complicated if the facilities that share the population around them are of different type; therefore, the radioactive release content or pathways of exposure may be different. An example describing such a situation is the conversion facility in Metropolis, Illinois, operated by Honeywell International, Inc., and the uranium enrichment facility in Paducah, Kentucky, operated by USEC Inc. These two types of facilities are in such close proximity that there is an almost complete overlap of the exposed population within the 50-km zone (Figure 4.4c).

FIGURE 4.4c. Population overlap between different types of facilities.


Population overlap between different types of facilities.

The above-mentioned example is also an example of facilities being located at or near the border of two or more states; hence, the population within 50 km of the facility is shared between two, three, or four states. Figure 4.4d illustrates some of the many power plants whose populations in close proximity reside not only in the state where the plant is located but also in neighboring states. For example, the populations living within 50 km of the Vermont Yankee plant in Vermont reside in Vermont, Massachusetts, and New Hampshire. Similarly, the populations living within 50 km of the Seabrook Station in New Hampshire reside in New Hampshire, Massachusetts, and Maine. This means that a requirement of a study that investigates the cancer risks of populations 50 km around the Seabrook plant is that it gains access to cancer registry data from New Hampshire, Massachusetts, and Maine. This has the potential to create logistical challenges in access to state-level administrative and health outcome data.

4.3.2. Cancer Registration Data

In theory, a cancer registry includes all cases of cancer in a defined population over a defined time period (such as all cases with a diagnosis after January 1, 1990). In practice there is always a cutoff date as well (such as diagnosis before January 1, 2009). Registries also have rules about what constitutes date of diagnosis to deal with such problems as a clinical suspicion of cancer, followed by an imaging study, followed by a positive biopsy. Such information is needed for any incidence- or mortality-based ecologic study, any cohort study that compares cancer rates in different areas, or a case-control study that estimates associations.

It takes time, typically 1-2 years after the occurrence of the cancer, to get registry files that are virtually complete. Connecticut was the first state to create and continuously run a population-based cancer registry; the data begin in 1935. In 1973, NCI established the SEER program, which now covers a sociodemographically diverse segment of 28 percent of the population in the United States. In 1992, the U.S. Congress expanded cancer surveillance to all states by establishing the National Program of Cancer Registries (NPCR), administered by the Centers for Disease Control and Prevention (CDC). In 2003, SEER and NPCR together provided 100 percent national coverage for cancer incidence reporting, with some overlap (see Figure 4.5). Cancer incidence reporting is accomplished through individual state mandates that are not entirely uniform.

FIGURE 4.5. Cancer registration coverage within the United States. SOURCE: NPCR.


Cancer registration coverage within the United States. SOURCE: NPCR. SEER

The SEER program is the primary source of historical information on cancer incidence and survival in the United States. Starting in 1973, SEER originally included geographic areas comprising about 10 percent of the U.S. population. SEER expanded in the early 1990s and again in 2001 and 2010 to cover 14, 26, and 28 percent of the U.S. population, respectively. SEER currently collects and publishes cancer incidence from 15 populationbased cancer registries and is the source of much of the survival data. Incidence reporting is based on residency in a SEER-covered geographic area at time of diagnosis. Registries have data-sharing agreements with neighboring states. This is important because residents of a state may seek medical diagnosis and treatment in a state other than the one where they reside, and thus have all of their medical records elsewhere. Also, states with many part-time residents (e.g., Florida) may experience reporting delays and extra work to consolidate records. The SEER program registries collect data on patient demographics, primary tumor site and morphology, stage at diagnosis, first course of treatment, and follow-up of vital status.

The registries in SEER collect information on address, state, county, and ZIP code, and derive the census tract. The registries send geographically coded (“geocoded”) county, census-tract, and census-tract certainty code18 to SEER, but addresses are not reported to SEER and if needed must be requested from the individual state registries. Census-tract certainty of at least 90 percent is required for urban areas and at least 80 percent for rural areas for SEER participation. Census-tract variables together with other identifiers are removed from the SEER public-use research file to protect the confidentiality of data for persons in small areas.

Although the studies considered here focus on the risks of developing first cancers only, this paragraph describes the registries’ regulations of recording multiple cancers, mostly to clarify that second or multiple cancers of an individual are recorded separately from the first. The SEER rules for classifying multiple primary cancers are followed by all registries in the United States (that is from all SEER and NPCR registries) and can be accessed at http://seer.cancer.gov/tools/mphrules/index.html. In general, all cancers that occur 2 or more months after the diagnosis of the first cancer are considered as separate primaries, unless the pathology report indicates that the cancer is due to recurrence or metastasis. Classification of multiple primary cancers depends on the cancer site of origin, date of diagnosis, histology, tumor behavior, and laterality of paired organs. Advances in the diagnosis and treatment of cancer leads to a rising number of cancer survivors who are at risk of developing new primary cancers.

A recent survey aimed to characterize the site-specific risks of second cancers and to provide clues to the underlying causal factors including the carcinogenic potential of treatment modalities such as chemotherapy and radiation, and/or the combination of the two treatments (SEER registries collect data on the first course of treatment of the cancer such as surgery, radiation therapy, chemotherapy). The survey used data from nine cancer registries participating in the SEER program from 1973 to 2000. Two million cancer survivors who survived at least 2 months and developed a new malignancy were included in the analysis; nearly 390,000 cases survived at least 10 years and 76,000 cases survived 20 or more years (http://seer.cancer.gov/publications/mpmono/MPMonograph_complete.pdf). About 9 percent of the survivors developed a second cancer and the risk of developing a second malignancy was dependent on multiple factors including smoking, alcohol use, viral infections and immunosuppression, genetic susceptibility, and prior cancer treatment, particularly the combination of radiotherapy and chemotherapy. The risk of developing a new malignancy was six times higher among childhood cancer survivors compared to adult survivors (SEER, New Malignancies Among Cancer Survivors: SEER Cancer Registries, 1973-2000). This finding is in agreement with previous studies of childhood cancers, which have implicated initial therapy and genetic susceptibility as major risk factors for cancers later in life (Neglia et al., 2011). NPCR

CDC provides support for states and territories to maintain registries that provide high-quality data through the NPCR. NPCR collects data on the occurrence of cancer, including the type, extent, location, and first course of treatment. Follow-up is not included except as noted below. Before NPCR was established in 1992, 10 states had no registry and the data collected by most state registries were incomplete. Today, NPCR supports central cancer registries in 45 states, the District of Columbia, Puerto Rico, and the U.S. Pacific Island Jurisdictions. The state registries’ year of operation and entry to the NPCR program is presented in Table 4.4. The NPCR data cover 96 percent of the population in the United States. Sources of information on cancer incidence are hospitals, laboratories, radiation therapy centers, medical oncology facilities, outpatient centers, and physicians’ offices; the last three are regarded as less complete reporting systems but the entire data set (1995 and forward) is resubmitted each year and completeness improves over time. Data items reported are age, race, gender, state, county, ZIP code and census tract, date of diagnosis, primary site, histology, staging, and follow-up information that includes vital status by linkage with the National Death Index. Census tract has been a required field since 2003.

TABLE 4.4. State Registries’ Year of Operation and Entry to the NPCR Program.


State Registries’ Year of Operation and Entry to the NPCR Program. North American Association of Central Cancer Registries (NAACCR)

NAACCR is an oversight group established in 1987 to set uniform standards for cancer registration as well as electronic data record structure. CDC, NCI, and other sponsoring organizations support it. All NPCR and SEER registries are members of NAACCR. NAACCR develops and promotes uniform data standards for cancer registration; provides education and training; certifies population-based registries; and aggregates and publishes data from central cancer registries. Data down to county level are released by NAACCR beginning in 1995, when NPCR started. Census-tract or address data for any year, or county data prior to 1995, must be requested from individual states. A major role of NAACCR is to provide state certification for quality of cancer registration. Assessing the Quality of Cancer Registration: National and International Efforts

The utility of cancer incidence data for research depends on the quality of the data. Researchers want to ensure that the data they use for their studies meet the highest standards of quality and reliability and therefore can have faith in their analyses. The two main factors that define the quality of a cancer registry are the completeness of case ascertainment and the accuracy of the details retrieved for each case. Cancer incidence data quality varies by state.

CDC has established standards for quality and completeness for NPCR registries. Data are evaluated each year and only data from those registries that meet NPCR standards are used for reporting of cancer incidence. The standards are presented in Table 4.5.

TABLE 4.5. Summary of Data Quality Criteria and Standards.


Summary of Data Quality Criteria and Standards.

Data in the SEER and NPCR data sets are combined to produce the United States Cancer Statistics (USCS) data set. The data set is produced by NCI and CDC in collaboration with NAACCR. Only cancer registries that demonstrated that cancer incidence data were of high quality are included in the data set. The criteria for USCS publication are also presented in Table 4.5. Data from all states and the District of Columbia met the USCS data quality criteria for 2008, but data from only 44 states and three U.S. Census regions (covering 90 percent of the U.S. population) met these criteria for the entire period 1999-2008 (Centers for Disease Control and Prevention, 2011).

In 1998 NAACCR developed a set of data standards for cancer registration and certified data quality beginning with 1995 data. NAACCR independently reviews the data from member registries for their completeness, accuracy, and timeliness and provides silver or gold registry certifications (Table 4.6). States that do not meet the standards are uncertified. Nearly all states in the United States have received a silver or gold certification for the most recent years. The data quality criteria and standards for 2011 are presented in Table 4.5.

TABLE 4.6. Summary of State Cancer Registries’ Data Quality by NAACCR Certification Methods.


Summary of State Cancer Registries’ Data Quality by NAACCR Certification Methods.

A cancer registry may not be able to collect complete information on all the incidence cancer cases within the timeframe for submission of the data to NAACCR, or may not be able to collect the information at all. (Of course, the actual number of incident cancer cases that a registry should have captured is an unobserved quantity that can be estimated by available data. The methodology used by NAACCR is described elsewhere [Das et al., 2008]). Having a high proportion of cases identified only by death certificates suggests that the procedures and sources used for case finding are inadequate or that matching to other sources is incomplete.

Similarly, a high proportion of duplicate reports suggests that the data “cleaning” processes are insufficient. NAACCR has been criticized for looking at the accuracy and timeliness of data at a single time point; recertification based on correctness of initially reported data has been suggested (Das et al., 2008).

Using cancer registration data for the years during which states had compromised quality of data is problematic because data quality may vary from place to place within the state. This may lead to bias and errors in comparing cancer frequency in these areas; the scope for such errors is reduced when data quality for the state as a whole is high.

It is not always clear how investigators can assess the quality of cancer registration for data prior to the NAACCR certification system (1995 data). Since the 1960s, the International Agency for Research on Cancer (IARC) publishes cancer incidence data from populations all over the world for which good quality data are available. The purpose of the publication is to compare rates of cancer incidence from different populations and draw conclusions on differences between and changes in cancer patterns by geographic area and formulate hypotheses about causes of cancer. The most recent publication (Volume IX) covers the period 1998-2002 and presents statistics from 60 countries and 225 registries, of which 54 are in North America (Curado et al., 2007). The publication provides a comprehensive summary of the participating states in the United States and includes information on the registration area covered, cancer care facilities that provide the cases’ information, registry structures and methods, and use of the data (for example, annual publications, support to researchers or policy makers, and intervention efforts). The publication also includes a table with the geographic coverage in the nine successive volumes of cancer incidence in the five continents which has been replicated here to present the data for the United States (Table 4.7).

TABLE 4.7. Geographic Coverage in the Nine Successive Volumes of IARC’s Cancer Incidence in Five Continents.


Geographic Coverage in the Nine Successive Volumes of IARC’s Cancer Incidence in Five Continents.

As within the United States, the cancer registry certification system did not exist until the 1995 data; the IARC judgment for “good quality” could be potentially used to select registries prior to 1995 that can be included in an epidemiologic study.

Independent of the certification of cancer registries by NAACCR or other systems, the quality of cancer registration will need to be judged following close examination of the data for each state cancer registry individually. State Registries

Collecting and maintaining high-quality cancer incidence data requires time and experience, and data in the first few years of a new registry need to be viewed with caution. Individual state cancer registries collect information on state, county, ZIP code, and address and derived census tract. Accessing cancer registry data for research, in particular for multistate data, is complicated and challenging because procedures for data use and confidentiality vary by location. On September 2010, CDC launched Cancer Registry Data Access (CRDA). The purpose of CRDA is to (a) provide understanding of comprehensive requirements and barriers of cancer registry data access for research, (b) identify optimal state and registry rules and policies, (c) investigate methods for streamlining the IRB processes and pilot test the best methods, and (d) assist researchers in managing the process. Basic and special requirements for data access vary substantially among states. The initial summary of information is expected to be completed September 2013 and will continue as needed.

To better understand what data are available in individual cancer registries for the immediate need of this study, the committee requested information regarding cancer incidence from the states that have or have had a nuclear facility. A letter template is presented in Appendix K. A summary of the results is presented in Table 4.8. Briefly, data were requested from 38 states, and 31 states responded (81 percent). The median year for which complete incidence data exist is 1992; cancer registration goes as far back as the 1970s for three respondent states, and to the 1980s for eight respondent states. All states that responded to the request had complete cancer registration by 1999. For convenience, Table 4.8 also summarizes availability of cancer mortality data, which is further discussed in Section 4.3.3.

TABLE 4.8. Availability of Cancer Incidence and Mortality Data of States that Have or Have Had a USNRC-Licensed Nuclear Facility.


Availability of Cancer Incidence and Mortality Data of States that Have or Have Had a USNRC-Licensed Nuclear Facility.

The letter responses received from the cancer registries and vital statistics offices identified several potential problems related to the availability and release of data. Although not strictly quantitative, examples of these obstacles are discussed here.

As expected, the year that complete data are available in a registry and year that the registry started operation may be different. For example, the cancer registry in New Mexico was established in 1966 and initiated statewide coverage in 1969; the most reliable data in accordance with standards set by the SEER program are for 1973 onward. Similarly, the cancer registry in Virginia started operation in 1979, but complete data are not available until 1999. In Nebraska, Maine, and Nevada the first years of complete data are 1995-1996, which coincides with the year the registries joined the NPCR program. Using cancer registry data prior to NPCR involvement requires further examination for consistency and comparability with the data collected post NPCR who implemented uniform rules across states. For New York, statewide data are available from 1976; however, the reference year is 1996 for the NPCR program. When the registry became part of NPCR it adopted the SEER multiple primary rules which are considered the national standard; previously the state was using the IARC rule for counting primary tumors which allows only one primary per site per person per lifetime. This change is important for the interpretation of cancer incidence statistics. The extent of the effect for each cancer site depends on the site-specific probability of multiple primaries.

Address at time of diagnosis is being collected widely at all times. However, for many rural residents, residential information may be expressed as P.O. boxes and rural route numbers. This may influence the quality of geo-coded data in these areas and it likely is a problem throughout the United States particularly when going back in time. Indeed, Boice and colleagues have emphasized that mailing addresses in small rural areas may not always reflect actual residences, and validation by contacting area postmasters and using Census Bureau geocoding information may be necessary to prevent misleading conclusions (Boice et al., 2003).

Census tract became a required field by NPCR in 2003. However, some states were recording this information before it became a required field. For example, Iowa has recorded censustract information since 1990. Some states that are part of the NPCR program, such as Maine and Alabama, do not collect census-tract information. As the Maine Cancer Registry director informed the committee, although NPCR made census tract a required field, it is not enforced. Since the decennial census may lead to changes in census tracts, reconstructing census tract from address is not straightforward and would require expertise in geocoding addresses; such expertise is available from some contractors and GIS professionals.

Several cancer registries noted the importance of knowing and understanding the methodology used to construct census-tract data. For example, to create the census-tract data for 1998, an investigator may have used population data from the 1990 census as it would have been available in 1998, or recalculated retrospectively by using the 2000 census data when those became available. The Massachusetts cancer registry noted that for the 1982 cancer registration data, the 1990 tracts were used, since in the 1980 census not all Massachusetts counties had defined tracts. In New Mexico, for incident cancer cases diagnosed in the calendar years 1973-1977, the 1970 census-tract boundaries would be assigned; for 1978-1987 the 1980 census-tract boundaries; for 1995-2000 incident cases would be assigned both the 1990 and 2000 census-tract boundaries. The quality of the census-tract determination depends on the availability of residential information in source records and as mentioned earlier this may influence the quality of geocoded data in rural areas.

The data item “census-tract certainty” documents the quality of residential information that was used to assign census tract for each case. The State of Illinois emphasized that the registry would not release census-tract data information for research, and thus they were reluctant to inform the committee when the registry started collecting the information, or if the information exists. However, if justified by research needs, address information from the Illinois cancer registry may be released upon review and approval of the application. Interestingly, although generally census-tract data exist for cancer registries, mortality data have not been routinely geocoded. Some vital statistics offices have data only for recent years while others (for example, Pennsylvania) will start in the near future.

Although cancer registries attempt to collect information on place of birth (and in the context of this study, one may need the information to make assumptions as to whether the person lived in the same place since birth), the information is largely missing from the medical record, which is the primary source of cancer diagnosis. For example, for the state of New York, birth place is missing for 26 percent of cases diagnosed in the period 1995-2008; for Texas birth place is missing for 42 percent; and for Illinois for 75 percent. Some states reported that the information often becomes available from death certificates. When it is available at all, place of birth is poorly reported and is coded only to the state level (or the national level for persons born outside the United States).

When states were asked about the quality and completeness of the data, they commonly referred to the certification received by NAACCR. Although “missing county” is a criterion for data quality, missing address is not and this may be a problem when data in small geographic units are needed for analysis.

Active follow-up for vital status is performed only by SEER registries. There is some passive follow-up in all states queried, commonly through linkages with the state’s vital records office, national death index, and social security death index. For states with more than one cancer registry, such as Washington, active follow-up is performed for the SEER registry only. More specifically, of the 39 counties within Washington, active follow-up occurs in the 13 counties that comprise the Washington SEER registry, while passive follow-up alone occurs in the remaining 26 counties.

All states that responded to the request for information on procedures for release of the data reported that approval is required following submission of a detailed study protocol that may include data elements requested, analysis plan, and plan for reporting and dissemination. (The committee was advised to use the NAACCR data element code book for communication of variables requested as it is a uniform language among all states.) More than one level of approval may be required from some states. For example, for investigators outside the University of New Mexico, which maintains the cancer registry for the state, additional approval must be obtained from the senior leadership team at the cancer registry (i.e., Principle Investigator, Medical Directors, and Program Manager), the New Mexico Department of Health-Office of the State Epidemiologist, and the Office of Human Research Protections at the University of New Mexico Health Sciences Center. Application forms are available on each of the states’ websites. Review processes vary with the protocol and the frequency IRB or other equivalent committees meet, but a decision within 1 to 6 months seemed to be the general rule. Alabama, Louisiana, and Tennessee place a limit on the studies the cancer registries support either due to staffing shortages or to minimize the patient burden when patient contact is required.

Table 4.9 summarizes the information on approval requirements for cancer registries (document Cancer Registry Data Access for Research was created January 11, 2012, by CDC). According to the CDC document on IRB requirements for central cancer registries, all states but Wisconsin permit the release of state resident’s identifiable data to researchers, but three states (Georgia, New Mexico, and Hawaii) require sponsorship from a local researcher. Special requirements such as parental and/or physician consent and a more difficult approval process exist for release of information for pediatric research in 15 states. For research projects that require patient contact and consent for release of confidential data, the contact (or initial contact) is required to be established by registry in some states and by the researcher in other states.

TABLE 4.9. Cancer Registry Research Approval Process.


Cancer Registry Research Approval Process.

Time and cost for release of the data are dependent on what is being requested and staff availability; data submission to NAACCR is the priority. Some states including Washington, Maryland, Massachusetts, Virginia, and Arizona do not charge for data release, although that is subject to policy changes. From those that charge for data release, different methods for estimating costs are in place. Oregon State charges $55 per hour, and Vermont charges $34 per hour. North Carolina charges a standard fee of $1,000 for a file that includes up to 50,000 records and an additional $100 for each additional 10,000 records. According to Illinois, data sets prepared for analysis can run anywhere from $5,000 to $10,000. Registries that are understaffed such as Maine (reported 50 percent staffing level, including no registry-based epidemiologist) would need to contract an epidemiologist to work on the data request. Currently the hourly rate is $75.00 per hour. Pediatric Cancer Registries

In contrast to cancers in adults, cancers in children are rare, making up less than 1 percent of all cancers diagnosed each year. About 11,200 children in the United States under the age of 15 will be diagnosed with cancer in 2011. Leukemia is the most common childhood cancer, accounting for about one-third of all cancers in children. Brain and other nervous system tumors, the second most common cancer in children, make up about 27 percent of childhood cancers (American Cancer Society, http://www.cancer.org/).

Many childhood cancers are curable with modern therapy. Five-year survival rates for all stages and all sites of cancer for children, aged <15 years, diagnosed from 1999-2006 was 82 percent (http://seer.cancer.gov/). Overall, this is great success compared to the 1970s, when the 5-year survival rate was less than 50 percent. This improvement in survival mostly reflects the improved leukemia treatments. For brain tumors, the 60 percent 5-year survival rate has improved slightly in the past 25 years.

Pediatric cancer incidence can be derived for any site or age group from individual state cancer registry data and from SEER. Unlike the situation in some European countries, such as Germany and Switzerland, there is no national population-based childhood cancer registry in the United States. The closest approximation of a pediatric cancer registry is the Childhood Cancer Research Network (CCRN), which is built on the Children’s Oncology Group (COG), an NCI-sponsored clinical trials cooperative group comprising more than 200 institutions, mostly in the United States and Canada, which collectively see and treat upward of 80 percent of children under the age of 15 with cancer (Steele et al., 2006).

The CCRN potentially could provide a resource for identification of cases for an epidemiologic study. The CCRN, after years in development and planning, was launched in 2001, as a pilot with funding from the NCI and participation by 23 COG institutions (Steele et al., 2006). The pilot experience showed roughly 96 percent patient and/or parent agreement to participate, after IRB approval and informed consent, for release of personal identifiers and possible future contact. Since completion of the pilot in 2007, the CCRN has been expanded groupwide with 100 percent participation of about 200 institutions obtaining IRB approval and roughly 20,500 cases enrolled as of April 2011. However, CCRN has definite limitations, including variation in registration rates by institution, geography, age, and cancer type. There is also the problem that not all children and adolescents with cancer in the United States are seen at a participating COG member institution. A collaborative study of COG investigators and SEER analyzed 10,108 cases of cancer in children under the age of 20 years and reported to 11 SEER registries between 1992 and 1997; of these, 5796 (57.5 percent) were registered with COG. Rates varied by geographic region and by age, with rates found to be highest for children <5 years (74.3 percent). Rates were also higher for children with more advanced disease (Liu et al., 2003). Thus, while the CCRN and COG institutions provide a framework for collection of cases and obtaining informed consent, ascertainment of cases would be biased and incomplete. As the formation of the CCRN is relatively recent, the data could not be used for study of childhood cancer cases diagnosed prior to 2001.

4.3.3. Cancer Death Data

Over the years, the most common and routinely collected cancer data are related to mortality. Kelsey et al. (1996) have comprehensively described the process of reporting the event of death to the national statistics and their summary is presented here. After completion of the death certificate, the funeral director or other person in charge of interment is responsible for completing the parts of the death certificate that require personal information about the deceased and for filing the certificate with the local registrar of the district in which the death occurred. A physician must complete and sign the medical certification section and enter the cause of death. If a physician has not been in attendance or the cause of death is thought to be the result of an accident, homicide, or suicide, the medical examiner or coroner must sign the certificate. The local registrar verifies that the death certificate has been completed, keeps a copy, and sends the certificate to the state registrar. After querying the local registrar about any incomplete or inconsistent information, the state registrar keeps one copy and sends another copy to the National Vital Statistics System of the NCHS. The NCHS is a division of the CDC and as such is under the U.S. Department of Health and Human Services. Death registration is considered virtually complete.

NCHS then summarizes the mortality data and documents the health status of the population in the United States. NCHS provides access to its data but does not release data for geographic units smaller than county; the vital records office of each state needs to be contacted for access to more geographically precise data. At NCHS, county-level data are available for 1968 to the present. Data are also available for 1959-1967 but have not gone through rigorous checks, and some gaps may exist for the period 1957-1967.

In 1979 NCHS established the National Death Index (NDI), a central computerized index of death record information for the entire country. Death records are added to the NDI database annually and become available approximately 12 months after the end of a particular calendar year. Personal identifiers such as name of deceased, father’s name, date of birth, social security number (SSN), and other variables, can be used to determine whether a person has died anywhere in the United States. NDI can provide a death certificate number for further linkage to the NCHS database to determine cause of death. However, NDI does not contain the address of the deceased individual.

Release of address information for mortality data can only be achieved by contacting the vital statistics offices of the state in which death occurred. Although death registration has existed for many decades, this has not always been done electronically. The committee requested information from 38 states that have or have had a nuclear facility on electronic availability of cancer mortality data; 17 states responded to the request (45 percent) (see Table 4.8). Complete mortality data have been available since 1970 in most states but subject address at time of death is not captured until much later in some states. A striking example is cancer death registration in Illinois. Death from cancer information is available since at least 1950, but only exists electronically since 1970, and address is included in the records only since 2008.

These delays and gaps appear because the primary purpose of the vital statistics offices is to provide documentation of death, not to support research. Although this view may be changing slowly, adding addresses for past years retrospectively requires an enormous amount of work and is not feasible in many states. The lack of address information accompanying cancer death registration is problematic for a study of cancer risks in populations near nuclear facilities as investigators are unable to assess risks related to the early operational years of the nuclear facilities, for example, the 1960s (when cancer registration efforts were nonexistent in the majority of the states). It was anticipated that mortality data at a geographic level smaller than county, such as census tract, would go further back in time than incidence data for the same geographic unit, or at least address would be available electronically and could be used to geocode the data. However, this is not generally the case. Of course, address at time of death is present in the hard copies of the death certificates, though an effort to retrieve the information from those in an ecologic study would be impractical.

In contrast to cancer incidence data, geocoding addresses to census tract is not common practice for mortality data (see Table 4.8). For example, cancer mortality data for Arizona are available since 1970, but census tract of reported deaths is available only since 1995 and is not complete. In Illinois, census tract is geocoded only for Chicago, roughly from 1979. Alabama does not geocode the data and, as it is very rural in some parts, even aggregated data have small counts of cancer deaths and will not be released.

Finally, in contrast to the cancer registry that has information only on state at birth (and even that is incomplete), the mortality database may contain city of birth.

4.3.4. Methods for Control Selection

In a case-control study the challenge is to identify individuals that are similar to the cases in all relevant respects except the exposure under study (controls). Random-digit dialing (RDD) has often been the preferred source of identifying population-based controls and it worked well until the mid 1990s. A 2.5 percent annual decline in the RDD response rates from 1982 to 2002 has been reported (Bunin et al., 2007). The increasing use of cellular phones, caller identification, and multiple telephone numbers for a given household are a few of the emerging problems with RDD as a source of control selection today, and the potential exists for RDD control samples to be biased with respect to socioeconomic status and population characteristics (Bernstein, 2006; Ma et al., 2004). An additional concern directly relevant to the design of the cancer risk assessment in populations near nuclear facilities is the fact that the population under study (cases and controls) will need to be geographically defined (residing within a specific distance from the nuclear facilities), which also makes RDD less appropriate.

Town records could be used, but these are not uniformly available across the country. The relevant Department of Motor Vehicles (DMV) is a possible source for control identification, but the files are restricted to those that drive. Thus, they do not include individuals who are not old enough to have a driving license and do not completely cover older populations. As a result, DMV records would not be useful for a study of childhood cancers. Alternative control identification methods such as use of a friend, neighborhood, family, or school controls have limitations that affect their appropriateness in a study of cancer risks in populations near nuclear facilities, including a high risk of overmatching on exposure and geographic location. Additionally, school controls would be appropriate only for studies of school-age children, and their use is likely to be administratively difficult in a multistate study (Ross et al., 2004).

Investigators, including those involved in multistate studies of childhood cancers, are exploring the feasibility of using birth certificate files to select controls in studies of childhood cancers. This strategy has the advantage of collecting data that facilitate matching on factors such as age and gender, but also data on risk factors of childhood diseases such as birth weight, and age and educational level of the mother. Birth registration is considered virtually complete and data on birth records are fairly complete, although the quality of information deserves consideration (Kirby and Salihu, 2006), and the use of these data eliminates the problem of recall bias as they are not self-reported after a diagnosis of cancer. In contrast to RDD, this method for control selection allows characterization of non-participants. Although birth records have been used successfully in many epidemiologic studies (see, for example, Ma et al., 2002; Rosenbaum et al., 2000; Von Behren et al., 2011), their use presents challenges in nationwide studies, as investigators need to receive approvals for data release from many state health departments and the requirements for release of the information differ by state. Obtaining IRB approvals for each state may require modifications to the general protocol. Moreover, the standard certificate for live births has not been implemented fully across the United States (Kirby and Salihu, 2006), so achieving consistency of the format of the data retrieved from the different state birth registries is complex and necessary before a study database is ready for analysis and research. However, it has been demonstrated that birth registries may be used to select controls for pediatric studies on a national scale, even if information to locate potential control subjects is requested (Spector et al., 2007).

The reproductive statistics branch of the NCHS holds electronic birth registration data since 1968. Similar to the release restrictions for mortality data, NCHS cannot release data on births for geographic units smaller than a county. Investigators will need to contact the vital records office of each state (same office that releases mortality data) to obtain addresses or tabulations by census tract or other smaller geographic units. In an effort to identify the release criteria of birth registration data and the potential of linkage of birth records with cancer registries within and across states, the committee sent a letter to the 38 states that have or have had a nuclear facility. Of the 38 offices surveyed, 12 responded to the request for information (31 percent). A letter template is presented in Appendix L. Overall a detailed research protocol is needed before the offices could comment on the feasibility of any research activities requiring data on birth registration. However, some general guidelines were provided: the office of vital records in New York explained that data with personal identifiers are not released; and in Alabama and Michigan individual birth records cannot be released without permission of the individuals involved or the parents. In some settings, the Health Insurance Portability and Accountability Act of 1996 requires that geographic location at resolution smaller than three-digit ZIP codes be considered a personal identifier that cannot be released without special permission. Illinois reported that currently researchers’ requests for data are not accepted.

4.3.5. Record Linkage and Individual Tracing Methods

Record linkage refers to the task of searching two or more files for records that belong to the same individual, such as a birth certificate and a medical record. Historically, most record linkage was performed by clerks, who reviewed lists and made linkage decisions for scenarios for which rules had been developed. Nowadays, linkage that involves large files is generally computerized in order to reduce or eliminate manual review and make the results more easily reproducible. Computerized linkage is also faster, matching decisions are more consistent, and quality controls are better (Winkler, 1995). Common record linkages in epidemiology are between birth records and state cancer registries to identify individuals who developed the disease of interest or with mortality data to determine who has died.

Successful linkage requires that the various data sources share one or more common identifiers—referred to as the matching or linking variables—such as name and date of birth of the index individual. Many times, two or more individuals share the same linking characteristics, and unavoidably registries contain administrative coding errors or double entries which complicate the one-to-one linkage process and may lead to a true match erroneously being designated as nonlink or to the true match being one of many possible matches. The ideal linkage variable was described as the one that has many different values, all having about the same frequency of occurrence, contains no missing data or errors, and has not changed in value over time. The higher the number of matching variables, the better the ability to distinguish matches (Winkler, 1995).

The two main methodologies used for record linkages are deterministic and probabilistic. Deterministic record linkage links pairs of records on the basis of whether they agree exactly on specific identifiers. Such record linkage is often feasible in countries with a long-standing tradition of a unique identifier at birth, such as the personal registration number used in Denmark or the identification numbers given to all residents in Sweden and Norway (Tromp et al., 2006). In the United States and other countries where such a unique identifier is not established at the time of birth, linkage is less straightforward and the probabilistic record linkage methodology is often used. This method uses probabilities to determine whether a pair of records refers to the same individual (Machado, 2004; Tromp et al., 2006). More specifically, the probabilistic record linkage method assigns a weight of (dis)agreement for the linking variables based on the probability that a variable agrees among matches and the probability that a variable agrees among nonmatches, this way defining the error rate and discriminating power of the linkage (Tromp et al., 2006).

The committee requested information from the states’ Departments of Vital Statistics on linkage capabilities (letter template is presented in Appendix L). Linkage of birth registration and cancer data within states is routine in many states (for example, California, Minnesota, Michigan, Arkansas, and Colorado). However, no state from those that responded to the committee’s request for information reported existing methods for linkage across states. One obstacle is lack of consistency across states on variables used for linkage. For example, Minnesota reported that currently records are linked on name, date of birth, and SSN. In North Carolina, SSN is not available in birth records. A second obstacle is that, even if such a nationwide linkage is technically possible, differences among state statutes governing cancer and birth registration would likely not support such a project.

Investigators can use record linkage to retrieve current information of the populations under study, and in this way attempt to trace and recruit them. This is not an easy task, as often the information available to start the tracing process is limited, and often a long time has elapsed since some of the information was current. Inability to recruit individuals may both reduce the power of the study and introduce bias in the results. For that reason, ensuring that tracing of individuals is done with success is key to the strength of any record-based study. Tracing of individuals for cohorts identified retrospectively is challenging and time consuming. Essential components described to contribute to successful efforts to track or retain study subjects include (1) attention to staff training and support, (2) effective tracking system, (3) incentives, (4) establishing rapport with participants, (5) ensuring confidentiality, and (6) use of a combination of contact means as appropriate (Hunt and White, 1998; McKenzie et al., 1999).

Tracing has been done successfully in the past. One example is the Hanford Thyroid Disease Study conducted in the 1990s, a retrospective cohort study of the effects of exposure to atmospheric radioactive releases from the Hanford Nuclear Site in southeastern Washington State in the 1940s-1950s (Davis et al., 2008; study is discussed in Appendix A). The study identified more than 5,000 cohort members using Washington state birth records from 1940 to 1946. The limited information contained in the birth records was used to trace more than 94 percent of the cohort members, nearly 50 years later. Tracing was conducted in two phases: a feasibility study to test the methodology proposed and to develop specific operational procedures, then a five-step approach to locate cohort members, beginning with the most readily available and least costly steps as described:


Computer matching to state records: birth records, DMV records, death records.


Readily available lists of individuals: telephone directories, post office forwarding, city and reverse directories, existing high school reunion lists, voter records, utility records.


Readily available, labor-intensive lists of individuals: neighborhood searches, former school teachers, old newspaper searches for death, birth and marriage announcements, other historical records.


Limited availability, labor-intensive lists of individuals: agricultural, civic, religious and veterans organizations, labor unions.


Available, costly contact of individuals: locating services, public appeal.

Motor vehicle licensing records and directories proved the most useful in tracing individuals. The investigators note that, at the time their study was conducted, the use of internet and email was not as widespread as it is today. These two options could potentially improve the tracing response rate. As methods of recruiting participants are also relevant for retaining participants in a longitudinal study, research on retaining participants emphasizes this point (Davis et al., 2008; Robinson et al., 2007). An average of five sources was required to locate an individual. An extensive effort was required before a cohort member was declared “unlocated” by the team of supervisory staff.

Another example of a study with satisfactory response rate of 75 percent used 14 sources to locate 230 parents of sudden infant death syndrome infants and 255 parents of healthy living infants in Southern California (Klonoff-Cohen, 1996). Possible reasons for the lower success rate compared to the Hanford study is that case parents were relatively young and transient without an established credit history and, therefore, harder to be traced through tax assessor records, and the fact that the Human Subjects Committee required at least a 1-year waiting period to contact the parents of the deceased infant, during which period the parents may have moved. The Northern California Childhood Leukemia Study, which enrolled birth registry controls aged 0-14 years reported a contact rate of 80 percent (Ma et al., 2004). A case-control study of birth defects based in seven Texas counties aimed to contact mothers and interview them by telephone 4 years after the births of their children. Case mothers were more likely than control mothers to be located (44 percent versus 30 percent, respectively) and, of those that were located, to be interviewed (43 percent versus 31 percent, respectively). Young maternal age and black race decreased the likelihood of locating mothers (Gilboa et al., 2006). Nationwide studies include the Pregnancy Risk and Monitoring System, which contacts mothers between 2 and 6 months after giving birth in 23 states. The study achieved a contact rate of 82 percent in 2001 (Shulman et al., 2006). As expected, age affects the effort required to trace children, with less efforts needed for birth certificate controls aged 0-4 years than for those aged 5-14 years (Ma et al., 2004).

4.3.6. Data on Population Characteristics

As discussed in Section 4.3.1, the U.S. Census is a source for information regarding the population characteristics such as age, gender, and race/ ethnicity. Surveillance systems that collect information on population characteristics over time, including lifestyle factors, are important for tracking such things as chances in the incidence of cancer or other chronic disease, and risk behavior prevalence. In the context of this report, surveillance systems are important as they could be a source of information on the characteristics of the populations compared and thus provide clues on potential confounders in an ecologic study. The committee found that three national surveillance systems might be relevant: The National Health Interview Survey (NHIS), the National Health and Nutrition Examination Survey (NHANES), and the Behavioral Risk Factor Surveillance System (BRFSS). All three surveys are managed by CDC. However, none of these surveys are directly applicable for the present task, as they do not contain information about behavioral data at the census-tract level. Technical and methodological details for the surveys are available online and briefly summarized here. Sources of health care information are also discussed, but again information from these sources is not directly applicable for the present task. The National Health Interview Survey (NHIS)

The NHIS is a large-scale face-to-face household interview survey of a random sample of households in the United States. The main objective of the NHIS is to monitor the health of the population in the United States and track progress toward national health objectives. Interviewers of the U.S. Census Bureau have conducted the survey for the NCHS continuously since 1957. Each year, interviewers visit 35,000 to 40,000 households across the county and collect data for about 75,000 to 100,000 individuals. The annual questionnaire consists of three components: the family core, the sample adult core, and the sample child core. The family core collects information on everyone in the family, including family composition, and basic demographic characteristics such as age, race, gender, income, and health insurance coverage. In addition, one adult and one child, if applicable, from each household are randomly selected and information on each is collected. In 2007, participation rates for the survey were 68 percent. As noted above, the goal of the NHIS is to collect summaries of health at the national, and perhaps state level, not at the fine geographic scale of census tracts. The National Health and Nutrition Examination Survey (NHANES)

NCHS also conducts NHANES, a survey that aims to assess the health and nutritional status of adults and children in the United States. The NHANES program began in the early 1960s. In 1999 the survey became a continuous program that examines a nationally representative sample of about 5,000 persons each year. Although substantially smaller than either NHIS or BRFSS, NHANES is unique because it combines information from interviews with a physical examination and some laboratory tests. The NHANES interview includes demographic, socioeconomic, dietary, and health related questions while the physical examination component consists of medical and dental measurements. In the 2005-2006 survey, participation rates were 80 percent. Again, the goals are estimates at the national and perhaps state level, not at the fine geographic resolution desired for the studies under consideration. NHANES, like NHIS, is based on cluster sampling. The Behavioral Risk Factor Surveillance System (BRFSS)

In 1984, the CDC recognized the importance to disease prevention of monitoring personal health behaviors in the general population and established the BRFSS in 15 states. A decade later, this system was in place nationwide. In contrast to NHIS and NHANES, BRFSS is a telephone-based survey conducted by state and territorial health departments with technical and methodological assistance provided by the National Center for Chronic Disease Prevention and Health Promotion of CDC. Each state works with CDC to develop a sampling protocol to select households and one adult (age < 18 years) is selected from each household and is interviewed. BRFSS is the only one of these three surveillance systems that can generate state-or territorial-based estimates on a variety of health measures. BRFSS collects data from approximately 210,000 people in 50 states, the District of Columbia, Puerto Rico, the U.S. Virgin Islands and Guam. Self-reports of health-related variables (e.g., weight) have not matched measurements from the other surveillance systems that do not rely on self-reports (Carlson et al., 2009). Perhaps the largest challenge in using BRFSS data is that the response rates for BRFSS have declined from 72 percent in 1993 to 51 percent in 2007. The low, and apparently biased, participation rates produce different estimates in some outcome measures compared to NHIS and NHANES, both of which have higher participation rates. The consequences have been estimated to be minimum in some cases and unknown in others (Fahimi et al., 2008). Finally, BRFSS provides design-based state and national estimates and some research has considered extensions to county level. However, the data are not sufficient to support design-based estimates at the census-tract level. Health Care Surveys

NCHS performs the National Health Care Survey to answer questions on the use and quality of health care, the impact of medical technology, and disparities in health care services provided to population subgroups in the United States. The National Health Care Survey is built upon the merging and expansion of separate record-based surveys:

  • National Ambulatory Medical Care Survey
  • National Hospital Ambulatory Medical Care Survey
  • National Survey of Ambulatory Surgery
  • National Nursing Home Survey
  • National Hospital Care Survey
  • National Nursing Assistant Survey
  • National Home and Hospice Care Survey
  • National Home Health Aide Survey
  • National Survey of Residential Care Facilities

The combined surveys use provider-based information which depending on the setting in which the care is delivered, may come from a record of the patient’s most recent visit, the hospital discharge form, or review of the entire medical record. Information on the sample design for each of the component surveys can be found at http://www.cdc.gov/nchs/dhcs.htm. Overall, the design is such to permit monitoring of the delivery of specific health care services and understanding the characteristics of the patients that receive different types of services. The National Hospital Discharge Survey (NHDS) is briefly described here as an example to demonstrate the relation of the different health care surveys and the potential for linkage with other national data sets.

NHDS is a national probability survey that was initiated in 1965 and was the first survey of medical care delivery conducted by the NCHS to collect information on inpatient use of short-stay nonfederal hospitals in the United States (Dennison and Pokras, 2000). The survey was redesigned in 1987 to improve on its sampling and link with the design of NHIS and to use automated retrieval of data, among other reasons. In 1988 the survey collected data on diagnoses, procedures, length of stay, and patient characteristics from a sample of approximately 250,000 discharges from over 500 hospitals. NHDS was conducted annually since its inception until 2010, when it was integrated into the National Hospital Care Survey together with data from the emergency department, outpatient department, and ambulatory surgery center data collected by the National Hospital Ambulatory Medical Care Survey (NHAMCS). (NHAMCS was conducted since 1973 and data were collected from the physician who would be randomly be assigned a 1-week reporting period.) The integration of these two surveys along with the collection of patient identifiers will permit linkage of care provided in different departments. It will also be possible to link the survey data to the NDI and Medicaid and Medicare data to obtain a more complete picture of patient care.

Important to the committee’s task and many times reiterated is the need for a source of information on medical diagnostic procedures that use radiation, especially those that use high doses such as CT scans. The main data source for aggregate counts on medical diagnostic procedures that involve radiation by body part is IMV19 IMV is a market research and database provider founded in 1977 which, using a variety of survey methods, tracks diagnostic medical procedures. While IMV surveys have high participation rates and cover a large number of imaging facilities (IMV data were the main source for the NCRP Report 160 [NCRP, 2009]), they do not have a detailed categorization of procedures and therefore are unable to capture the variation in radiation doses and protocols. Detailed data on counts of procedures for large populations are also available from administrative claims such as Medicare. However, information is restricted to those that are age 65 or over and use this social insurance program. Neither IMV nor Medicare data are directly applicable for the present task, as they do not contain information about medical diagnostic imaging at the census-tract level.


This chapter provides the committee’s assessment of methodological approaches for carrying out a cancer epidemiology study. Based on this assessment, the committee finds that:


The statistical power of an epidemiologic study of cancer risks in populations near nuclear facilities is likely to be low because (a) the size of the estimated risks from the reported radioactive effluent releases from nuclear facilities is likely to be small and (b) the size of the populations most likely to be exposed (that is, those in close proximity to a nuclear facility, for example, within an 8-km radius) is relatively small. This implies that a large-scale multisite study with as many years of observations as possible is needed to reliably assess the potential risks.


Centralized cancer registries such as SEER and NPCR (for cancer incidence) or national offices such as NCHS (for cancer mortality) can only release data that are aggregated across geographic areas such as counties. Cancer incidence and mortality data for more refined geographic areas can be released only by individual states upon submission and approval of a research proposal. In general, cancer mortality data are available since about 1970, but individual address at time of death is not captured until much later in some states. Moreover, mortality data are not consistently geocoded at the census-tract level. Cancer incidence data of known quality are available from about 1995. These data include address at time of diagnosis and have been widely geocoded.


Large-scale studies that rely on contacting individuals are likely to be subject to selection and information biases due to difficulties related to tracing individuals, low (and declining) participation rates of cases and especially controls in epidemiologic studies, and the risk of collecting inaccurate information via interviews and questionnaires. Alternatively, studies that rely on information in existing records are more practical and free of the biases mentioned above, although other limitations exist.


Studies of pediatric cancers could take advantage of existing linkages of cancer registration and birth records in at least six states that include more than 30 percent of the U.S. pediatric population.

In light of these findings, the committee recommends that, should the USNRC decide to proceed with an epidemiologic study of cancer risks in populations near nuclear facilities (Phase 2), two studies be carried out to assess cancer risks in populations near nuclear facilities: (a) an ecologic study of multiple cancer types that would provide an assessment of cancer incidence and mortality in populations living within approximately 50 km of nuclear facilities and (b) a record-linkage-based case-control study of childhood cancer that would provide an assessment of early life exposure to radiation during more recent operating periods of nuclear facilities. The strengths and limitations of the recommended studies are described in Section 4.2.3. Specifying up front the hypotheses to be tested and the analysis plan is the responsibility of the Phase 2 committee.

The committee judges that additional information and analyses beyond the scope of this Phase 1 activity are needed to assess the feasibility of carrying out the recommended studies that could be performed by a pilot study. The purpose of the pilot study is to evaluate the feasibility of the methods proposed, and to develop the specific operational procedures and data collection methods needed for a full study. The purpose of the pilot study is not to perform a small-scale preliminary assessment of risks, the results of which would be used for or against moving forward with the full study.

As discussed in Chapter 3, seven facilities were selected collaboratively by the dosimetry and epidemiology experts of this committee and include Dresden (Illinois), Millstone (Connecticut), Oyster Creek (New Jersey), Haddam Neck (Connecticut), Big Rock Point (Michigan), San Onofre (California), and Nuclear Fuel Services (Tennessee). The reasons of selection of these facilities with regards to dosimetry are discussed in Chapter 3. These facilities are also good candidates to evaluate the feasibility of the studies from the epidemiologic perspective as they represent both currently operating and decommissioned facilities in six states, that started operation in different time points and with some variation in (a) the population size in close proximity, (b) quality and maturation of the state’s cancer registration, and (c) level of complexity for registry’s research approval processes and research support. Actions specific to the recommended studies to be taken during the piloting activity are the following:

  • Retrieve cancer incidence and mortality data at the census-tract level within 50 km of selected facilities to assess feasibility of the recommended ecologic study.
  • Confer with investigators conducting linkages of cancer and birth registration data to identify eligible cases of pediatric cancers and matched controls to assess feasibility of the recommended record-linkage-based case-control study in the selected facilities. In states with the necessary capabilities, but without such linkages in place, link birth registration and cancer incidence data.


  • Bernstein L. Epidemiology. 3. Vol. 17. 2006. Control recruitment in population-based case-control studies; pp. 255–257. [PubMed: 16617272]
  • Bernstein J. L., Haile R. W., Stovall M., Boice J. D., Shore R. E., Langholz B., Thomas D. C., Bernstein L., Lynch C. F., Olsen J. H., Malone K. E., Mellemkjaer L., Borrensen-Dale A.-L., Rosenstein B. S., Teraoka S. N., Diep T. A., Smith S. A., Capanu M., Reiner A. S., Liang X. J Natl Cancer Inst. Vol. 102. 2010. Radiation exposure, the ATM gene, and contralateral breast cancer in the Women’s Environmental Cancer and Radiation Epidemiology Study; pp. 475–483. [PMC free article: PMC2902825] [PubMed: 20305132]
  • Berrington de Gonzalez A., Mahesh M. Arch. Intern. Med. 22. Vol. 169. 2009. Projected cancer risks from computed tomographic scans performed in the United States in 2007; pp. 2071–2077. [PMC free article: PMC6276814] [PubMed: 20008689]
  • A. B. Berrington de González, Brenner A., Hartge P., Lee C., Morton L., Rajaraman P. Evolving strategies in epidemiologic research on radiation and cancer; Radiat. Res; Epub Aug. 8, 2011. 2011. pp. 527–532. [PubMed: 21823973]
  • Bithell J. F., Keegan T. J. Radiat. Prot. Dosim. 2. Vol. 132. 2008. Childhood leukaemia near British nuclear installations: Methodological issues and recent results; pp. 191–197. [PubMed: 18936090]
  • Boice J. D., Bigbee W. L. Health Phys. 6. Vol. 85. 2003. Cancer incidence in municipalities near two former nuclear materials processing facilities in Pennsylvania; pp. 678–690. [PubMed: 14626319]
  • Boice J. D., Cohen S. S., Mumma M. T., Ellis E. D., Eckerman K. F., Leggett R. W., Boecker B. B., Brill A. B., Henderson B. E. Radiat. Res. 2. Vol. 176. 2011. Updated mortality analysis of radiation workers at Rocketdyne (Atomics International), 1948-2008; pp. 244–258. [PubMed: 21381866]
  • Brenner D. J., Doll R. Proc. Natl. Acad. Sci. U S A. 24. Vol. 100. 2003. Cancer risks attributable to low doses of ionizing radiation: Assessing what we really know; pp. 13761–13766. [PMC free article: PMC283495] [PubMed: 14610281]
  • Bunin G. R., Spector L. G. Am. J. Epidemiol. 1. Vol. 166. 2007. Secular trends in response rates for controls selected by random digit dialing in childhood cancer studies: A report from the Children’s Oncology Group; pp. 109–116. [PubMed: 17456476]
  • Carlson S. A., Densmore D. J. Phys. Act. Health. (Suppl 1) Vol. 6. 2009. Differences in physical activity prevalence and trends from 3 U.S. surveillance systems: NHIS, NHANES, and BRFSS; pp. S18–S27. [PubMed: 19998846]
  • Centers for Disease Control and Prevention. MMWR. 36. Vol. 60. 2011. State-specific trends in lung cancer incidence and smoking—United States, 1999-2008; pp. 1243–1247. [PubMed: 21918494]
  • Chokkalingam A. P., Bartley K. Cancer Causes Control. 12. Vol. 22. 2011. Haplotypes of DNA repair and cell cycle control genes, X-ray exposure, and risk of childhood acute lymphoblastic leukemia; pp. 1721–1730. [PMC free article: PMC3206192] [PubMed: 21987080]
  • Cohen B. L. Health Phys. 2. Vol. 68. 1995. Test of the linear-no threshold theory of radiation carcinogenesis for inhaled radon decay products; pp. 157–174. [PubMed: 7814250]
  • Cohen B. L. Health Phys. 1. Vol. 72. 1997. Lung cancer rate vs. mean radon level in U.S. counties of various characteristics; pp. 114–119. [PubMed: 8972836]
  • Curado M. P., Edwards B., Shin H. R., Storm H., Ferlay J., Heanue M., Boyle P. IX. 2007. Cancer incidence in five continents. IARC Scientific Publications No. 160.
  • Darby S., Hill D. BMJ. (7485) Vol. 330. 2005. Radon in homes and risk of lung cancer: collaborative analysis of individual data from 13 European case-control studies; p. 223. [PMC free article: PMC546066] [PubMed: 15613366]
  • Das B., Clegg L. X. Cancer Causes Control. 5. Vol. 19. 2008. A new method to evaluate the completeness of case ascertainment by a cancer registry; pp. 515–525. [PMC free article: PMC2668648] [PubMed: 18270798]
  • Davis S., Onstad L. Ann. Epidemiol. 3. Vol. 18. 2008. Locating members of a cohort identified retrospectively from limited data in 50-year-old records: successful approaches employed by the Hanford Thyroid Disease Study; pp. 187–195. [PubMed: 18201901]
  • Dennison C., Pokras R. Vital Health Stat. 39. Vol. 1. 2000. Design and operation of the National Hospital Discharge Survey: 1988 redesign; pp. 1–42. [PubMed: 11261241]
  • Dufault B., Klar N. Am. J. Epidemiol. 10. Vol. 174. 2011. The quality of modern cross-sectional ecologic studies: A bibliometric review; pp. 1101–1107. [PubMed: 21940800]
  • Evrard A. S., Hemon D. Br. J. Cancer. 9. Vol. 94. 2006. Childhood leukaemia incidence around French nuclear installations using geographic zoning based on gaseous discharge dose estimates; pp. 1342–1347. [PMC free article: PMC2292746] [PubMed: 16622448]
  • Fahimi M., Link M. Prev. Chronic Dis. (3) Vol. 5. 2008. Tracking chronic disease and risk behavior prevalence as survey participation declines: statistics from the behavioral risk factor surveillance system and other national surveys; p. A80. [PMC free article: PMC2483564] [PubMed: 18558030]
  • Federal Radiation Council. Health implications of fallout from nuclear weapons testing through 1961; Washington, DC: 1962.
  • Fell D. B., Dodds L., King W. D. Paediatr. Perinat. Epidemiol. 6. Vol. 18. 2004. Residential mobility during pregnancy; pp. 408–414. [PubMed: 15535816]
  • German R. R., Fink A. K. Cancer Epidemiol. 2. Vol. 35. 2011. The accuracy of cancer mortality statistics based on death certificates in the United States; pp. 126–131. [PubMed: 20952269]
  • Gilboa S. M., Mendola P. Birth Defects Res. A Clin. Mol. Teratol. 1. Vol. 76. 2006. Characteristics that predict locating and interviewing mothers identified by a state birth defects registry and vital records; pp. 60–65. [PubMed: 16397888]
  • Hartge P. Epidemiology. 3. Vol. 17. 2006. Participation in population studies; pp. 252–254. [PubMed: 16617271]
  • Hays J., Hunt J. R. Ann. Epidemiol. (9 Suppl) Vol. 13. 2003. The Women’s Health Initiative recruitment methods and results; pp. S18–S77. [PubMed: 14575939]
  • Heath C. W., Bond P. D., Hoel D. G., Meinhold C. B. Health Phys. (6) Vol. 87. 2004. Residential radon exposure and lung cancer risk: commentary on Cohen’s county-based study; pp. 647–655.pp. discussion 656–658. [PubMed: 15545771]
  • Hunt J. R., White E. Epidemiol. Rev. 1. Vol. 20. 1998. Retaining and tracking cohort study members; pp. 57–70. [PubMed: 9762509]
  • Jablon S., Hrubec Z., Boice J. D., Stone B. J. 1-3. 1990. Cancer in populations living near nuclear facilities. NIH Publication No. 90-874.
  • Jablon S., Hrubec Z. JAMA. 11. Vol. 265. 1991. Cancer in populations living near nuclear facilities. A survey of mortality nationwide and incidence in two states; pp. 1403–1408. [PubMed: 1999880]
  • Johnson K. J., Carozza S. E. Epidemiology. 4. Vol. 20. 2009. Parental age and risk of childhood cancer: a pooled analysis; pp. 475–483. [PMC free article: PMC2738598] [PubMed: 19373093]
  • Kaatsch P., Spix C. Int. J. Cancer. 4. Vol. 122. 2008. Leukaemia in young children living in the vicinity of German nuclear power plants; pp. 721–726. [PubMed: 18067131]
  • Kelsey J. L., Whittemore A. S. Methods in Observational Epidemiology. New York and Oxford: Oxford University Press; 1996.
  • Kinlen L. Lancet. 8624. Vol. 2. 1988. Evidence for an infective cause of childhood leukaemia: Comparison of a Scottish new town with nuclear reprocessing sites in Britain; pp. 1323–1327. [PubMed: 2904050]
  • Kinlen L. Br. J. Cancer. 1. Vol. 104. 2011. Childhood leukaemia, nuclear sites, and population mixing; pp. 12–18. [PMC free article: PMC3039801] [PubMed: 21063418]
  • Kirby R. S., Salihu H. M. Birth. 3. Vol. 33. 2006. Back to the future? A critical commentary on the 2003 U.S. National standard certificate of live birth; pp. 238–244. [PubMed: 16948724]
  • Klonoff-Cohen H. Am. J. Epidemiol. 1. Vol. 144. 1996. Tracking strategies involving fourteen sources for locating a transient study sample: Parents of sudden infant death syndrome infants and control infants; pp. 98–101. [PubMed: 8659490]
  • Land C. E. Science. 4462. Vol. 209. 1980. Estimating cancer risks from low doses of ionizing radiation; pp. 1197–1203. [PubMed: 7403879]
  • Land C. E. J. Radiol. Prot. (3A) Vol. 22. 2002. Uncertainty, low-dose extrapolation and the threshold hypothesis; pp. A129–A135. [PubMed: 12400961]
  • Last J. M. A dictionary of epidemiology. New York: Oxford University Press; 1995.
  • Law G. R. Radiat. Prot. Dosim. 2. Vol. 132. 2008. Host, family and community proxies for infections potentially associated with leukaemia; pp. 267–272. [PubMed: 18945723]
  • Little M. P., Boice J. D. Radiat Res. 2. Vol. 151. 1999. Comparison of breast cancer incidence in the Massachusetts tuberculosis fluoroscopy cohort and in the Japanese atomic bomb survivors; pp. 218–224. [PubMed: 9952307]
  • Liu L., Krailo M. Cancer. 5. Vol. 97. 2003. Childhood cancer patients’ access to cooperative group cancer programs: A population-based study; pp. 1339–1345. [PubMed: 12599243]
  • Ma X., Buffler P. A. Br. J. Cancer. 9. Vol. 86. 2002. Daycare attendance and risk of childhood acute lympho-blastic leukaemia; pp. 1419–1424. [PMC free article: PMC2375371] [PubMed: 11986774]
  • Ma X., Buffler P. A. Am. J. Epidemiol. 10. Vol. 159. 2004. Control selection strategies in case-control studies of childhood diseases; pp. 915–921. [PubMed: 15128601]
  • Machado C. J. Cad. Saude Publica. 2. Vol. 20. 2004. A literature review of record linkage procedures focusing on infant health outcomes; pp. 362–371. [PubMed: 15073615]
  • Malone K. E., Begg C. B., Haile R. W., Borg A., Concannon P., Tellhed L. X., Teraoka S., Bernstein L., Capanu M., Reiner A. S., Riedel E. R., Thomas D. C., Mellemkjaer L., Lynch C. F., Boice J. D., Anton-Culver H., Bernstein J. L. J. Clin. Oncol. 14. Vol. 28. 2010. Population-based study of the risk of second primary contralateral breast cancer associated with carrying a mutation in BRCA1 or BRCA2; pp. 2404–2410. [PMC free article: PMC2881721] [PubMed: 20368571]
  • McKenzie M., Tulsky J. P. J. Health Care Poor Underserved. 4. Vol. 10. 1999. Tracking and follow-up of marginalized populations: A review; pp. 409–429. [PubMed: 10581885]
  • McLaughlin C. C., Baptiste M. S. Am. J. Epidemiol. 9. Vol. 163. 2006. Maternal and infant birth characteristics and hepatoblastoma; pp. 818–828. [PubMed: 16510543]
  • Morin A., Backe J. 2002. Programme environnement et santé 1999. Une estimation de l’exposition du public due aux rejets radioactifs des centrales nucléaires (in French). Technical Note SEGR/SAER/02–51 Indice 1. Institut de Radioprotection et de Sûreté Nucléaire, Fontenay-aux-Roses (July)
  • Morton L. M., Cahill J. Am. J. Epidemiol. 3. Vol. 163. 2006. Reporting participation in epidemiologic studies: A survey of practice; pp. 197–203. [PubMed: 16339049]
  • Mueller B. A., Chow E. J. Arch. Pediatr. Adolesc. Med. 10. Vol. 163. 2009. Pregnancy outcomes in female childhood and adolescent cancer survivors: A linked cancer-birth registry analysis; pp. 879–886. [PMC free article: PMC2758647] [PubMed: 19805705]
  • NCRP (National Council on Radiation Protection and Measurements). Ionizing radiation exposure of the populations of the United States; 2009.
  • Neglia J. P., Friedman D. L., Yasui Y., Mertens A. C., Hammond S., Stovall M., Donaldson S. S., Meadows A. T., Robison L. L. J. Natl. Cancer Inst. 8. Vol. 93. 2001. Second malignant neoplasms in five-year survivors of childhood cancer: Childhood cancer survivor study; pp. 618–629. [PubMed: 11309438]
  • Neutra RR. Am J Epidemiol. 1. Vol. 132. 1990. Counterpoint from a cluster buster; pp. 1–8. [PubMed: 2356803]
  • NRC (National Research Council). Committee to Assess Health Risks from Exposure to Low Levels of Ionizing Radiation. Washington, DC: The National Academies Press; 2005. Health Risks From Exposure to Low Levels, of Ionizing Radiation: BEIR VII—Phase 2.
  • Nuclear Safety Council and the Carlos III Institute of Health. 2009. Epidemiological study of the possible effect of ionizing radiations deriving from the operation of Spanish nuclear fuel cycle facilities on the health of the population living in their vicinity, Spain.
  • Parker D. M., Whelan S. L., Ferlay J. VIII. 2002. Cancer Incidence in Five Continents. IARC Scientific Publications No. 122.
  • Pawel D. J. Health Phys. (2) Vol. 89. 2005. Can confounding by smoking explain the ecologic correlation between lung cancer and radon? pp. 181–182. author reply 182. [PubMed: 16010131]
  • Pierce D. A., Shimizu Y. Radiat. Res. 1. Vol. 146. 1996. Studies of the mortality of atomic bomb survivors. Report 12, Part I. Cancer: 1950-1990; pp. 1–27. [PubMed: 8677290]
  • Pierce D. A., Sharp G. B. Radiat. Res. 6. Vol. 163. 2005. Joint effects of radiation and smoking on lung cancer risk among atomic bomb survivors; pp. 694–695. [PubMed: 16044494]
  • Podvin D., Kuehn C. M. Paediatr. Perinat. Epidemiol. 4. Vol. 20. 2006. Maternal and birth characteristics in relation to childhood leukaemia; pp. 312–322. [PubMed: 16879503]
  • Preston D. L., Mattsson A. Radiat. Res. 2. Vol. 158. 2002. Radiation effects on breast cancer risk: A pooled analysis of eight cohorts; pp. 220–235. [PubMed: 12105993]
  • Preston D. L., Shimizu Y. Radiat. Res. 4. Vol. 160. 2003. Studies of mortality of atomic bomb survivors. Report 13: Solid cancer and noncancer disease mortality: 1950-1997; pp. 381–407. [PubMed: 12968934]
  • Preston D. L., Ron E. Radiat. Res. 1. Vol. 168. 2007. Solid cancer incidence in atomic bomb survivors: 1958-1998; pp. 1–64. [PubMed: 17722996]
  • Puumala S. E., Soler J. T. Int. J. Cancer. 6. Vol. 122. 2008. Birth characteristics and Wilms tumor in Minnesota; pp. 1368–1373. [PubMed: 18033684]
  • Reynolds P., Von Behren J. Am. J. Epidemiol. 7. Vol. 155. 2002. Birth characteristics and leukemia in young children; pp. 603–613. [PubMed: 11914187]
  • Richardson D., Sugiyama H. Radiat. Res. 3. Vol. 172. 2009. Ionizing radiation and leukemia mortality among Japanese Atomic Bomb Survivors, 1950-2000; pp. 368–382. [PubMed: 19708786]
  • Robinson K. A., Dennison C. R. J. Clin. Epidemiol. 8. Vol. 60. 2007. Systematic review identifies number of strategies important for retaining study participants; pp. 757–765. [PMC free article: PMC1997303] [PubMed: 17606170]
  • Ron E., Lubin J. H. Radiat. Res. 3. Vol. 141. 1995. Thyroid cancer after exposure to external radiation: A pooled analysis of seven studies; pp. 259–277. [PubMed: 7871153]
  • Rosenbaum P. F., Buck G. M. Am. J. Epidemiol. 12. Vol. 152. 2000. Early child-care and preschool experiences and the risk of childhood acute lymphoblastic leukemia; pp. 1136–1144. [PubMed: 11130619]
  • Ross J. A., Spector L. G. Am. J. Epidemiol. (10) Vol. 159. 2004. Invited commentary: Birth certificates—a best control scenario? pp. 922–924. discussion 925. [PubMed: 15128602]
  • Rothman K. J. Am. J. Epidemiol. (1 Suppl) Vol. 132. 1990. A sobering start for the cluster busters’ conference; pp. S6–S13. [PubMed: 2356837]
  • Rothman K. J., Greenland S. Modern Epidemiology. Philadelphia: Lippincott Williams & Wilkins; 1998.
  • Satten G. A., Kupper L. L. Am. J. Epidemiol. 1. Vol. 131. 1990. Sample size requirements for interval estimation of the odds ratio; pp. 177–184. [PubMed: 2293748]
  • Savitz D. A., Olshan A. F. Am. J. Epidemiol. 9. Vol. 142. 1995. Multiple comparisons and related issues in the interpretation of epidemiologic data; pp. 904–908. [PubMed: 7572970]
  • Sermage-Faure C., Laurier D., Goujon-Bellec S., Chartier M., Guyot-Goubin A., Rudant J., Hémon D., Clavel J. Int. J. Cancer. 2012. Childhood leukemia around French nuclear power plants—the Geocap study, 2002-2007. Epub Feb. 20. [PubMed: 22223329]
  • Shore R. E., Iyer V. Regul. Toxicol. Pharmacol. (2 Pt 1) Vol. 15. 1992. Use of human data in quantitative risk assessment of carcinogens: Impact on epidemiologic practice and the regulatory process; pp. 180–221. [PubMed: 1626069]
  • Shulman H. B., Gilbert B. C. Public Health Rep. 1. Vol. 121. 2006. The Pregnancy Risk Assessment Monitoring System (PRAMS): Current methods and evaluation of 2001 response rates; pp. 74–83. [PMC free article: PMC1497801] [PubMed: 16416701]
  • Socolow E. L., Hashizume A. N. Engl. J. Med. Vol. 268. 1963. Thyroid carcinoma in man after exposure to ionizing radiation. A summary of the findings in Hiroshima and Nagasaki; pp. 406–410. [PubMed: 13989805]
  • Spector L. G., Ross J. A. Am. J. Epidemiol. 7. Vol. 166. 2007. Feasibility of nationwide birth registry control selection in the United States; pp. 852–856. [PubMed: 17623744]
  • Spycher B. D., Feller M. Int. J. Epidemiol. 2011. Childhood cancer and nuclear power plants in Switzerland: A census-based cohort study. Epub Jul. 12. [PMC free article: PMC3204210] [PubMed: 21750009]
  • Steele J. R., Wellemeyer A. S. Cancer Epidemiol. Biomarkers Prev. 7. Vol. 15. 2006. Childhood cancer research network: A North American Pediatric Cancer Registry; pp. 1241–1242. [PubMed: 16835317]
  • Tromp M., Reitsma J. B. AMIA Annu. Symp. Proc. 2006. Record linkage: Making the most out of errors in linking variables; pp. 779–783. [PMC free article: PMC1839331] [PubMed: 17238447]
  • Trott K. R., Rosemann M. Radiat. Environ. Biophys. 2. Vol. 39. 2000. Molecular mechanisms of radiation carcinogenesis and the linear, non-threshold dose response model of radiation risk estimation; pp. 79–87. [PubMed: 10929376]
  • UNSCEAR (United Nations Scientific Committee on the Effects of Atomic Radiation). Annex A—Epidemiological studies of radiation and cancer. I. 2006. Sources and Effects of Ionizing Radiation.
  • USEPA (U.S. Environmental Protection Agency). Radiation Risks and Realities. 2007. http://www​.epa.gov/rpdweb00​/docs/402-k-07-006.pdf .
  • Von Behren J., Spector L. G. Int. J. Cancer. 11. Vol. 128. 2011. Birth order and risk of childhood cancer: A pooled analysis from five US States; pp. 2709–2716. [PMC free article: PMC3008504] [PubMed: 20715170]
  • Walker K. M., Carozza S. J. Agric. Saf. Health. 1. Vol. 13. 2007. Childhood cancer in Texas counties with moderate to intense agricultural activity; pp. 9–24. [PubMed: 17370910]
  • Wanebo C. K., Johnson K. G. N. Engl. J. Med. 13. Vol. 279. 1968. Breast cancer after exposure to the atomic bombings of Hiroshima and Nagasaki; pp. 667–671. [PubMed: 4299516]
  • White-Koning M. L., Hemon D. Br. J. Cancer. 5. Vol. 91. 2004. Incidence of childhood leukaemia in the vicinity of nuclear sites in France, 1990-1998; pp. 916–922. [PMC free article: PMC2409865] [PubMed: 15280917]
  • Willett W. C., Stampfer M. J. N. Engl. J. Med. 1. Vol. 316. 1987. Dietary fat and the risk of breast cancer; pp. 22–28. [PubMed: 3785347]
  • Winkler W. E. Matching and record linkage, in Business Survey Methods. In: Cox B. G., Binder D. A., Chinnappa B. N., Christianson A., Colledge M. J., Kott P. S., editors. Hoboken, New Jersey: John Wiley & Sons; 1995.



The term “bias” when used scientifically does not necessarily imply the researcher’s desire for a particular outcome, or any prejudice, as it is often implied with the conventional use of the term.


This dose to the average person in the United States includes people who never had a medical procedure that involves high-dose radiation, such as CT scan or a fluoroscopy procedure. For those individuals that have had such procedures, the annual dose is higher. For reference, the average dose received from a CT scan is 8 mSv.


Misclassification is the erroneous attribution of a cancer into a category other than that it should be assigned.


Often in radiation epidemiology nonleukemia cancers are grouped and analyzed together in a category named “solid cancers.” This grouping may make only limited sense from a biological or medical point of view since cancers at different sites are too different to be grouped in terms of their causes, other risk factors including genetic effects, carcinogenesis stages (Trott and Rosemann, 2000), and possibly histology. However, because the numbers of cancers at individual sites are too small for a robust analysis, grouping is often a necessity.


The studies discussed in this report focus on first cancers only. Second primary and multiple primary cancers, that is, those cancers occurring in patients who were diagnosed with another cancer in the past, are not considered. A second primary is different from a cancer that reappears after treatment (recurrence) or is a result of the original cancer metastasizing to a non-adjacent organ. Recording of multiple cancers in cancer registries is discussed in Section 4.3.2.


Women’s Environment, Cancer, and Radiation Epidemiology.


Statistical procedure used to minimize the effect of differences in the composition of the populations or individuals compared.


There are other methods of controlling for confounding at the design phase such as restriction, or at the analysis phase by standardization, stratification, and multivariate analysis.


Radiation doses are much higher during radiation therapy, often on the order of 5,000 to 50,000 times as large (NCRP, 2009), but only a small fraction of the population undergoes radiation therapy, primarily as part of a cancer treatment plan. As discussed in Section 4.2.1 only the first primary cancers are considered for inclusion in the analysis; therefore, secondary cancers attributed to therapeutic radiation are not taken into account.


In a strictly demographic definition, birth order is based on the ordinal number of live births.


The NCI study included facilities that were in operation by 1982.


The term IRB describes the standing committee in a medical or research institution, hospital, or other health care facility, whose task is to ensure the safety and well-being of human subjects and privacy of any information retrieved from those subjects.


The committee judges that a study is feasible if it satisfies the following criteria: (a) it is based on existing data for cases, the at-risk population, and common confounding factors; (b) it meets the criteria regarding release of those data for research purposes; and (c) it considers knowledge and experience from studies in the field including anticipated participation of subjects.


A code provided by the geocoding vendor service that indicates the quality of assignment of census tract for an individual record; address scores higher than residence ZIP code, which scores higher than ZIP code of P.O. box.

Copyright 2012 by the National Academy of Sciences. All rights reserved.
Bookshelf ID: NBK201995


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (6.3M)

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...