Using mortality data to estimate radiation effects on breast cancer incidence.

In this paper we combine Japanese data on radiation exposure and cancer mortality with U.S. data on cancer incidence and lethality to estimate the effects of ionizing radiation on cancer incidence. The analysis is based on the mathematical relationship between the mortality rate and the incidence and lethality rates, as well as on statistical models that relate Japanese incidence rates to U.S. incidence rates and radiation risk factors. Our approach assumes that the risk of death from causes other than the cancer does not depend on whether or not the cancer is present, and among individuals with the cancer, the risk of death attributable to the cancer is the same in Japan and the U.S. and is not affected by radiation exposure. In particular, we focus on the incidence of breast cancer in Japanese women and how this incidence is affected by radiation risk factors. The analysis uses Japanese exposure and mortality data from the Radiation Effects Research Foundation study of atomic bomb survivors and U.S. incidence and lethality data from the Surveillance, Epidemiology, and End Results Registry. Even without Japanese incidence data, we obtain reasonable estimates of the incidence of breast cancer in unexposed Japanese women and identify the radiation risk factors that affect this incidence. Our analysis demonstrates that the age at exposure is an important risk factor, but that the incidence of breast cancer is not affected by the city of residence (Nagasaki versus Hiroshima) or the time since exposure.


Introduction
The relationship between ionizing radiation and cancer mortality is fairly well understood. Quantitative risk models have been developed to assess the impact of environmental and occupational levels of ionizing radiation on cancer mortality and to establish radiation exposure standards (1,2). The epidemiological data on which this work has been based are primarily the mortality data from follow-up studies of atomic bomb survivors (3)(4)(5)(6)(7) and several large medical cohorts, such as patients with ankylosing spondylitis (8). Both the cancer mortality data and the exposure data have been of sufficient quality to enable the modeling of cancer mortality rates.
Recently the emphasis of quantitative risk assessment has shifted from cancer mortality to cancer incidence as researchers have become aware of the need to assess the relationship between ionizing radiation and the onset of cancer (9). The primary impediment to the development of quantitative incidence models is the lack of tumor registries with complete coverage and high-quality data on radiation exposure and cancer incidence. Furthermore, though some analyses have used tumor registry data from Nagasaki alone (5-7), there have not been any analyses of cancer incidence data from both Hiroshima and Nagasaki.
Our analysis combines Japanese data on radiation exposure and cancer mortality with U.S. data on cancer incidence and cancer lethality to estimate the effects of ionizing radiation on cancer incidence. This approach is based on standard likelihood techniques, the mathematical relationship between the mortality rate and the incidence and lethality rates, and statistical models relating the Japanese incidence rates to both the U.S. incidence rates and various radiation risk factors. We focus on estimating the incidence of breast cancer in Japanese women and identifying the radiation risk factors that influence a woman's chances of developing breast cancer.

Definitions
The cancer incidence rate, the cancer mortality rate, and the cancer lethality rate are three important descriptors of the cancer onset/death process. Suppose that age is grouped into J intervals and let Ij, Mj, and Ljk denote the incidence rate, the mortality rate, and the lethality rate associated with breast cancer (1 -k s j -J). Among all women surviving to the beginning of the jth interval, Ij represents the risk of developing breast cancer in the jth interval, Mrepresents the risk of dying due to breast cancer in the jth interval, and Ljk is a conditional mortality rate that represents the rnsk of dying due to breast cancer in the jth interval, given the cancer was diagnosed in the kth interval. The incidence rate, the mortality rate, and the lethality rate are discrete hazard rates that correspond to the distributions of age at breast cancer onset, age at death due to breast cancer, and time from breast cancer onset to death due to breast cancer, respectively.

Assumptions
The proposed methods rely on three assumptions. First, we assume that the cancer lethality rates in Japan and the U.S. are the same. The distribution of ages at which women develop breast cancer (I) can differ between Japan and the U.S., as can the distribution of ages at which women die due to breast cancer (Mj), but among women diagnosed with breast cancer the risk of death due to cancer (Ljk) does not depend on their nationality. Second, we assume that risk factors related to radiation exposure do not affect the cancer lethality rate, although these risk factors can affect the cancer incidence rate and hence the cancer mortality rate. Third, we assume that the risk of death from causes other than breast cancer does not depend on whether or not breast cancer is present.

Breast Cancer Lethality
Among all women diagnosed with breast cancer in the kth interval, let Cjk denote the number who died due to breast cancer in thejth interval and let bjk denote the number who died due to other causes (or were lost to follow-up) in the jth interval. Apart from constants ofproportionality, the log-likelihood ofthe breast cancer lethality data can be expressed in terms ofthe Ljk values as follows: Recall that the lethality rate, L,k, is the probability of death due to breast cancer in the jth interval, given that the cancer was diagnosed in the kth interval and the woman was alive just before the jth interval (1 £ k £ j £ J). If the follow-up period for women diagnosed with breast cancer is relatively short, a yearly lethality rate would provide much more information than one based on intervals of several years. For example, if women are followed for up to 12 years after diagnosis, then typically they contribute to the estimation of only a few 5-year lethality rates.
Let Gik denote the yearly lethality rate, which is defined as the probability of death due to breast cancer at age i, given that the cancer was diagnosed in the kth interval and the woman was alive at age i-1. It can be shown that Ljk and Gik are related as follows: where the product is over all ages i in the jth interval.
Apart from constants, the log-likelihood of the yearly lethality data (conditioned on the grouped diagnosis  (4) where #{E} denotes the number of women for whom event E is true. Practically speaking, the nonparametric analysis can be viewed as a fully saturated parametric analysis in the sense that a separate value of the lethality rate is estimated for each combination of the age at death and the intervals for age at diagnosis. Second, in order to reduce the number of quantities to be estimated, we can specify a parametric model for the Gik values, such as the following logistic model (10) E E (iak + 1)P a%Jpq, (6) p=o q=O which assumes that the lethality rate is a quadratic (logistic) function of the age at diagnosis and the time since diagnosis. In this case, a total of only nine parameters (i.e., the 1pq values) need to be estimated. Each Gik is estimated by substituting the ML estimates of the *pq values into Eq. (6).

Breast Cancer Mortality and Incidence
We assumed a piecewise Exponential model for the breast cancer mortality rates, which allows the Mj values to vary with age (across intervals) even though they are constant within intervals. The log-likelihood of the breast cancer mortality data under the piecewise Exponential model is ij=1 where cj = cjk is the number of women who died due to breast cancer during the jth interval and rj is the number of person-years at risk observed during the jth interval. Dinse and Hoel (manuscript in preparation) demonstrate that the cancer mortality rate can. be expressed as the following function of the cancer incidence rates and the cancer lethality rates: Linking U.S. and Japanese Incidence Rates In order to relate the breast cancer incidence rates in Japan and the U. S., we specified the following model: no parametric assumptions about the form of the incidence rates themselves, only the way in which 1JAPAN relates to IUSA. Based on a first-order approximation, Eq. (9) can be viewed as a discrete analog of the proportional hazards model in which the ratio of U.S. and Japanese incidence rates is a log-linear function of age.
Estimating Baseline Incidence in Japan The incidence of breast cancer in unexposed Japanese women can be estimated as follows. Express X, as a function of IJJAPAN, Ljk, and the Japanese mortality data. Replace the Ljk values with the ML estimates obtained from the U.S. lethality data and use Equation (9) to express IJJAPAN in terms of JUSA 4k, and 41. Estimate 1jUSA from the U.S. incidence data as follows: #{diagnosed with breast cancer in the jth interval} (10) #{alive without breast cancer upon entering the jth interval} If we treat the estimates of the U.S. lethality and baseline incidence rates as known, an estimate of the Japanese baseline incidence rate,,1JAPAN, can be calculated by maximizing XI as a function only of 4o and 41 and then substituting the estimates ofIj A, 0 and +1 into Eq. (9).

Linking Incidence to Radiation Exposure
In order to relate information on radiation dose (D) and a vector of covariates or exposure variables (Z) to the breast cancer incidence in Japanese women, we specified the following model: 1 -IJAPAN (DJZ) = (1 -IAPAN)1+DeOz , (11) where 0 is a vector ofparameters, IJJAPAN is the baseline breast cancer incidence rate, and I 1'AN (D,Z) is the breast cancer incidence rate for women who were exposed to dose D with covariate vector Z. Based on a first-order approximation, Eq. (11) can be viewed as a discrete analog of the relative risk model in which the ratio of incidence rates for exposed and unexposed Japanese women is a simple linear function of dose and an exponentiated covariate term.

Analyzing Radiation Effects on Incidence
The first step is to categorize the covariates. Next, assume that the cancer mortality function is piecewise Exponential with a different value, Mi, for the ith combination of the various covariate categories and the J age-at-death intervals. The log-likelihood of the data on cancer mortality and radiation exposure is where ci is the number of women in the ith category who died due.to breast cancer and ri is the number of person-years at risk observed in the ith category. Rewrite this log-likelihood as a function ofIJAPAN (D,Z), and LUk, say e,, by replacing each Mi in Eq. (12) by a modification of Eq. (8) in which we substitute either I^:APA orlJAPAN (D,Z) for Ij, depending on whether or not the exposure occurred after the jth age interval. A 10-year latency period was incorporated by including the covariate terms only for the age intervals that exceeded the age at exposure by at least ten years. Replace the Ljk values with the ML estimates obtained from the U.S. lethality data and use Eq. (11) to express I4JAPAN (D,Z) in terms of IJAPAN and 0. Treat the estimates of the lethality and baseline Japanese incidence rates as known so that 4I: reduces to a function in which only 0 is unknown. An estimate of the Japanese incidence rate, JAPAN (D,Z), can be obtained by maximizing XA as a function of 0 and then substituting the estimates of IJJAPAN and 0 into Eq. (11).
The importance of various exposure variables can be assessed by means of the likelihood ratio test (11). The likelihood ratio statistic is equal to twice the difference between the maximized values of A, for models with and without the exposure variable(s) of interest. The significance level (p-value) associated with the exposure variable(s) is obtained by comparing the likelihood ratio statistic to the percentage points of the chi-squared distribution. The number of degrees of freedom equals the difference in the numbers of parameters estimated in the two models.

Sources of Data
We obtained data on breast cancer incidence and lethality for unexposed U.S. women from the Surveillance, Epidemiology, and End Results (SEER) Registry (12). The SEER Registry gives information on women diagnosed with breast cancer between 1973 and 1984, as well as follow-up data on these women over the 12-year period from 1973 to 1985. For more than 125,000 women diagnosed with breast cancer, the SEER Registry provides the age at diagnosis, the age at the last follow-up visit, and an indicator of whether the woman was dead or lost to follow-up (e.g., still alive). If the woman died, the registry also gives the age at death and a cause-of-death indicator.
We used data on radiation exposure and breast cancer mortality for Japanese women from the Radiation Effects Research Foundation (RERF) Life Span Study (3,4). This study consists of a cohort of atomic bomb survivors followed since 1950, with each individual having a calculated radiation exposure that depends on distance and shielding. The women in the RERF study have contributed approximately 1.35 million personyears at risk; there have been more than 15,000 deaths, including 155 deaths due to breast cancer. Radiation dose is measured in milligrays (mGy) and represents the total free-in-air kerma. There are three exposure variables: city (1 = Hiroshima, 2 = Nagasaki), age at exposure, and time since exposure. The data on age at exposure, time since exposure, and attained age are recorded in 5-year intervals. For the ith combination of city, age at exposure, time since exposure, and attained age, the data set gives the number of person-years at risk (ri), the number of deaths due to breast cancer (ci), and the mean radiation dose (Di).

Breast Cancer Lethality
The women in the SEER study were followed for, at most, 12 years; thus we chose to model the yearly lethality rates (i.e., the Gik values) rather than the lethality rates corresponding to the 5-year intervals (i.e., the Ljk values). As described earlier, the Gik values were estimated in three ways. First, the nonparametric ML estimate in Eq. (4) was calculated for each of the 168 combinations of the 12 years of follow-up and the 14 intervals for age at diagnosis. Second, the three parameters in Eq. (5) were estimated separately for each of the 14 age-at-diagnosis intervals, giving in a total of 42 estimates. Third, the nine parameters in Eq. (6) were estimated on the basis of all the data.
These three sets of breast cancer lethality estimates are plotted in Figure la-c. For each age-at-diagnosis interval k, the ML estimates of the Gik values are plotted against the number of years since diagnosis. For women diagnosed before age 60, the breast cancer death rate initially tends to increase as a function oftime since diagnosis and then decreases toward zero, with a peak between 2 and 4 years after diagnosis. For women diagnosed at or after age 60, the breast cancer death rate seems to steadily decrease with time since diagnosis. The three figures suggest that Eq. (6) is adequate for summarizing the yearly lethality rates because the estimates based on the 9-parameter model provide roughly the same information as those based on the 42parameter or 168-parameter models. The Ljk values can be estimated by substituting the ML estimates of the Gik values into Eq. (2).

Baseline Incidence
In order to estimate the baseline incidence of breast cancer in U.S. women, we applied Eq. (10) to the data from the SEER Registry. The resulting estimate of each IjUSA, multiplied by 100,000 and rounded to the nearest integer, is listed in Table 1 and plotted on a logscale in Figure 2. The incidence of female breast cancer in the U.S. is negligible for women under 25, increases rapidly for women between 25 and 60, and then begins to level off for women over 60.
In order to estimate the baseline incidence of breast cancer in Japanese women, we treated the mortality data in the lowest radiation exposure group in the RERF study (i.e., 0-4 mGy) as baseline mortality data. Next, we used these data to construct XA, treated the estimates of jUSA and L1k as fixed, and then maximized XA as a function of 4O and 41. The resulting estimates were: 4) = -0.1467 and j = -0.0279. The corresponding estimate of each IJAPAN, which is obtained by substituting 4O and 41 into Eq. (9), is listed in Table 1  Yer Sk Diagnosi FIGURE 1. Breast cancer lethality. The rate of death due to breast cancer, conditional on the cancer being diagnosed in a particular age interval, is plotted against the number of years since diagnosis for each of 14 age-at-diagnosis groups. (a) Depicts the nonparametric estimates, (b) depicts the parametric estimates derived from fitting a separate quadratic logistic model to the data for each age-at-diagnosis group, and (c) depicts the parametric estimates derived from fitting a single logistic model that is quadratic in both the age at diagnosis and the time since diagnosis. estimates (13) of the incidence rates of breast cancer for women in three Japanese prefectures (Osaka in 1973-7; Fukuoka in 1974-5; Miyagi in 1973-7) are also listed in Table 1 and plotted in Figure 2.
The most notable feature of Table 1 and Figure 2 is that the incidence of breast cancer in Japanese women is much lower than in U.S. women. The breast cancer incidence in Japan peaks at a level that is roughly an order of magnitude less than the highest incidence in the U.S. and then, in contrast to the U.S. incidence, appears to decrease slightly in the older age groups. The other remarkable result is that our estimates of the Japanese incidence rates, which are based on Japanese mortality data, U.S. incidence and lethality data, and the simple log-linear model in Eq. (9), are very similar to the observed incidences of breast cancer in Osaka, Fukuoka, and Miyagi.

Radiation Effects
We evaluated the effect of ionizing radiation on the incidence of breast cancer in Japanese women through bJapanese rates were obtained from the IARC data (13). the use of Eq. (11). The log-likelihood XI was expressed as a function of the U.S. lethality rates, the Japanese baseline incidence rates, and the parameter vector 0.
We then substituted the Japanese mortality data into X, treated the estimates of L, and IjJA as fixed, and maximized A, with respect to the elements of 0. The effects of the exposure variables were assessed by comparing differences in the values of twice the maximized log-likelihoods, which are listed in Table 2 for various combinations of the exposure variables. In every case, the first element ofthe covariate vector Z is the constant 1, which allows for a simple dose effect. Table 2 illustrates that age at exposure was the primary predictor of breast cancer incidence among exposed women; the incidence rate decreased as the age at exposure increased. In fact, the relevant information was captured by simply dichotomizing age at exposure into two groups: women under age 20 and women of age 20 or older. Further partitioning of the age at exposure into three groups (ages 0-19, 20-49, 50 + ) did not yield a significant improvement. As for the other covariates, city seemed to have no effect at all, and time since exposure had a minimal effect; the incidence rate tended to increase with the time since exposure, but the increase was not statistically significant.

Conclusions
We have shown that by borrowing information on cancer incidence and cancer lethality in the U.S. from the SEER data, age-specific cancer incidence rates in Japan can be estimated from Japanese cancer mortality data. In fact, our methods produced very good estimates of the Japanese incidence rates of breast cancer, even though the incidence ofbreast cancer in the U.S. is much Age (in years) FIGURE 2. Baseline breast cancer incidence. The baseline incidence of breast cancer is plotted against age for women in the U.S. and Japan. The plain solid line depicts the general incidence rate in the U.S.; the lines connected by open squares, circles, and triangles depict the incidence rates in the Japanese prefectures of Osaka, Fukuoka, and Miyagi; and the dashed line depicts our estimate of the general incidence rate in Japan. greater than in Japan. The accuracy of the incidence estimates provides support for our assumption of similar U.S. and Japanese breast cancer lethality rates. Currently we are refining the model for the cancer lethality rates, and we are in the process of applying our technique to other cancer sites.
The second aspect of this work is the application of statistical models that estimate cancer incidence induced by radiation. This approach permits incidencebased radiation risk estimation, even without adequate cancer incidence data. We are applying our modeling technique to various cancer sites and will compare our incidence estimates with other available estimates, even those based on direct estimation from actual incidence data.
The Japanese data on radiation and cancer mortality have been obtained from the Radiation Effects Research Foundation (RERF) in Hiroshima, Japan. RERF is a private foundation that is funded equally by the Japanese Ministry of Health and Welfare and the U.S. Department of Energy through the U.S. National Academy of Sciences. The conclusions reached in this report are those of the authors and do not necessarily reflect the scientific judgment of RERF or its funding agencies. The authors thank RERF for the use of their data and Frank DiIorio for his assistance with much of the computer programming.