Statistical analysis of disease onset and lifetime data from tumorigenicity experiments.

We present and discuss several methods for analyzing rodent tumorigenicity experiments. Two approaches are based on the age and tumor status (present/absent) of each animal at the time of death, and assume either that the tumor type is nonlethal or instantly lethal. Two other approaches avoid such restrictive assumptions about tumor lethality by requiring additional types of data. One method assumes that animals are randomly sacrificed at various ages throughout the study. The second approach requires that each animal which develops the tumor be classified as dying either from the tumor or from other causes.


Introduction
Rodent tumorigenicity experiments are frequently used to assess the safety of chemicals, food additives, pesticides, and other products. Typically, a control group of rodents is compared with one or more "exposed" groups that are fed the compound of interest throughout their lifetimes. The substance is deemed carcinogenic if, loosely speaking, it increases the rate of development of one or more tumor types.
From a statistical point of view, the analysis of rodent tumorigenicity experiments poses a number of interesting and challenging problems. One reason for this is that most tumor types are occult and therefore detectable only after the animal has died. The statistical analysis of experiments involving observable or palpable tumors, such as skin and mammary tumors, is much simpler and will not be discussed in this paper. Thus, the quantity of interest, time to tumor onset, is not directly observed-we know only whether or not the tumor has developed by the time of death. The problem is compounded by two other phenomena: first, tumor types differ in their lethality, with some causing death shortly after onset and others never or rarely causing death. This means that the time from onset to death will depend in part on the tumor lethality. The second complication is that because of the extremely high dosages of the test compounds, the longevity of the exposed animals is often affected. Consequently, tumors might appear to occur earlier in the exposed group simply because death occurs sooner, or might not appear at all because the animals died before the tumor could develop.
Because of these complexities, the comparison of control and exposed groups based on the proportions of animals that eventually develop tumors can be misleading. Although it is now widely recognized that comparisons between the control and exposed groups should account for age at death, there is no single appropriate method of incorporating age. We shall see that appropriate methods exist when the tumor type in question is nonlethal, when it is instantly lethal, when there are serial sacrifices, or when pathologists determine whether or not the detected tumors caused death. This paper reviews these methods and discusses their advantages and shortcomings. gin in a tumor-free state and then either develop the tumor or die tumor-free. By "tumor onset" we mean the earliest age at which the tumor would be detectable by a microscopic examination of the involved organ or tissue. Animals that develop the tumor eventually die with the tumor present.

General Formulation
Suppose U1 denotes the age at (time to) the first of tumor onset or death, a(t) indicates whether [a(t) = 1] or not [a(t) = 0] the tumor has occurred by age t, and U denotes age at (time to) death. Probabilistically, the model can be characterized by the intensity functions a(t), 1(t), and b(t,x), where a(t) = lim (At)-fPr{t < U1 < t + At, The first of these functions, a(t), specifies the risk of tumor onset and is the quantity of primary interest in the comparison of the control and exposed groups. The function ,8(t) describes the risk of death without tumor. In many experiments, this risk is affected by the high doses of compound that are given to the rodents; consequently, differences between the control and exposed groups with respect to the function 1(t) are not regarded as evidence oftumorigenicity. The function 8(t,x), which describes the death risk with tumor, is also a nuisance function, because it also may depend upon the toxicity of the chemical being tested. The statistical problem is to test the equality of the functions a(t) in the control and exposed groups in the presence of the nuisance functions 13(t) and b(t,x).
A function which is often considered to be a measure of carcinogenicity is the tumor prevalence rate, or the probability that an animal living at age t has developed the tumor; that is, The observable data for each animal consist of U, the age at death, and the binary variable a = a(U), which indicates whether (a = 1) or not (a = 0) the tumor has developed by the time of death. The only estimable aspects of the model depicted in Figure 1 are those functions that can be expressed in terms of the distribution of the observed random variables (U,a). The latter are characterized (1) by the cause-specific hazard functions hl(t) = lim(W)-Pr-t* U<t + At,a = 1 UOt} I&t-IO and ho(t) = lim(At)-Pr{t-U< t+ At, a = 0UB t} A&t 0* It can be verified that The fundamental difficulty in the analysis of carcinogenicity experiments is that none of oa(t), 1(t), or b(t,x) can be expressed solely in terms of h1(t) or ho(t). Thus, without further assumptions, these three basic model parameters are nonidentifiable when the data consist only of U and a. Furthermore, since ho(t) and h1(t) depend on the nuisance functions ,B(t) and 8(t,x), neither is appropriate as a basisfor determining tumorigenicity. The rest of the paper will discuss proposed methods for overcoming this nonidentifiability problem.

Instantly Lethal Tumor Types
For two special cases of the general model, appropriate analyses of a(t) are possible. The first of these is when the tumor type under consideration is instantly lethal; i.e., death follows instantaneously after tumor onset. For example, leukemias and reticulum cell sarcomas are considered to be virtually instantly lethal. For such tumors, the process in Figure 1 is characterized solely by the functions a(t) and 1(t), and simplifies to one where an animal begins tumor free and then either develops a tumor (and dies instantly), or dies without a tumor. The standard competing-risks frame-212 g(t,X) = a(x) work can be used to describe this process by taking where T1 and Y are random variables representing the potential times to tumor onset and death without tumor, respectively. These are imagined to "compete" with one another to determine when and how death occurs. Analyses of competing-risk data are well-documented in the statistical literature (2). For example, the cumulative intensity function can be consistently estimated by where n(t) is the number of tumor deaths at age t and N(t) is the number of animals alive at age t -0. Perhaps the most common test of equality of groups with respect to the functions a(t) is the logrank test. This test arises as the partial likelihood score test of 0 = 0 in the proportional hazards model where aE(t) = exp(O) ac(t) and the subscripts denote the exposed and control groups, respectively. The construction of the test is described in Figure 2. Essentially, the control and exposed groups are compared at each age of death where there is a tumor, and the observed and expected number of tumor deaths in the exposed group are summed over these death times. The test is valid regardless of whether or not aE(t) is proportional to ac(t), but is most efficient when proportionality holds. Other so-called "lifetable" tests are designed to be most sensitive to alternatives other than proportionality. When the tumor type is not instantly lethal, the noncentrality parameter of the logrank and related tests are distorted in a direction that depends on the functions PE(t) and Pc(t) (3). In particular, when the risk of death from nontumor causes is greater in the exposed group, the resulting significance levels tend to be too small. That is, when applied to tumors which are not instantly lethal, the logrank test is invalid and tends to reject the null hypothesis too frequently.

Nonlethal Tumor Types
Fornally, a nonlethal tumor type is one for which 8(t,x) = 1(t) for all t and x; that is, a tumor type for which tumor onset does not alter the risk of death. The identifiable functions hl(t) and ho(t) simplify to h1(t) = ,B(t) rr(t) and ho(t) = ,B(t) { 1 -rr(t)} and the prevalence function becomes ir(t) = 1 -exp{-f a(u)du} Thus, both a(t) and 1(t) (and ff) are identifiable.
Nonlethal tumor types can also be represented by a special competing risks framework, with U = y and 1 if T1>Y a l if T, > Y where the independent random variables T1 and Y represent the potential time to tumor onset and the time to death, respectively. Note that T1 is either right censored (when a = 0) or left censored (when a = 1); it is never directly observed. Also, rr(t) can be thought of as the c.d.f. of T1, with corresponding hazard function a(t). This representation shows that the problem of estimating nT(t) [or, equivalently a(t)] corresponds to the so-called binomial extremum problem (4); i.e., let t1 < t2< . . . denote the distinct ages of death, let Nj be the number of deaths at age tj, and let nj be the number of nl, n2, ... are independently binomial with probabilities *ntj) that are ordered. It can be shown (4,5) that the constrained maximum likelihood estimator of the set of iij is also the solution to the constrained weighted leastsquares criterion X N {rr (tj) -nj/Nj} and given by the isotonic regression of the naive estimators n/N with weights Nj.
It can be verified that which indicates that animals dying at age t are representative (6), with respect to tumor presence, of all animals surviving until at least age t. It follows that a test for differences between rrE(t) and rc(t) can be obtained by comparing the control and exposed groups with respect to the proportions of tumors found in animals dying at age t. The first such "prevalence" test was proposed by Hoel and Walburg (7) and has the same form as the logrank test, except that the numbers of animals "at risk" at age t are defined differently (see Fig. 2). It can be shown (8) that the Hoel-Walburg test arises as the likelihood score test of 0 = 0 from the logistic model where logitQrE(t)} = 0 + 1ogit{*1(t)} that is, where the two groups have proportional prevalence-odds functions. Thus, the Hoel-Walburg test is most efficient when the two prevalence functions have proportional odds, but remains valid when they are not proportional. Analogous to logrank tests, prevalence tests are distorted when applies to tumor types that are not nonlethal (6). Other types of prevalence tests have been recently investigated by McKnight and Crowley (9) and Finkelstein (10).

Use of Serial Sacrifices
For the special cases of the preceding section, problems of nonidentifiability of model parameters were avoided by assuming that the tumor type either is instantly lethal or that it is nonlethal. However, most types of tumors are somewhere between these two extremes. One approach to achieving identifiability for these intermediate tumor types utilizes serial sacrificing; that is, the random selection, killing, and examination of live animals at various ages. From Eq. (1), the proportion of animals sacrificed at age t that are found to have a tumor estimates 7r(t), the tumor prevalence at age t. Thus, with serial sacrificing the identifiable aspects of the process depicted in Figure 1 are those expressible in terms of hl(t), ho(t), and rr(t). It follows that 13(t) is identifiable since we can write It can also be shown that the function oa(t) is identifiable, though a simple functional form such as that for ,B(t) is not obtainable. Although serial sacrificing can in theory overcome nonidentifiability problems, the methods (9,(11)(12)(13)(14) which use this information require very large amounts of data. For this reason, experiments with extensive sacrificing, other than at the termination of an experiment, are quite rare.

Use of Cause of Death Information
An alternative to using sacrifice data for estimating and testing time to tumor onset becomes available when each animal found to have a tumor at death is further classified as having died from that tumor or from nontumor causes (see Figure 3). The observable "data" are (U,a,b), where the binary indicator variable b equals one for deaths due to the tumor and zero otherwise. This amounts to decomposing the intensity function 5(t,x) in Figure 1 into the sum -y(t,x) + X(t,x), where -y(t,x) = lim (At)f-'Pr{t S U < t + At,  b =1 Ut, a = 1, U1 = x} and A(t,x) = lim (At)f-1Pr{t -U < t + At, If a tumor develops at time x, y(t,x) and X(t,x) represent the risks of dying from the tumor and from other causes, respectively, at time t.
Suppose we assume that X(t,x) = 13(t) for all t and x; that is, that the risk of death from nontumor causes is unaltered by tumor onset. Given this assumption and the data (U,a,b), the identifiable components of the model are the functions a(t), ,B(t), and wr(t). The identifiability of r(t) follows because it can be verified that under the above assumption 7r(t) = Pr{ a = 1 U = t, b = 0 } Note also from Eq. (2) that the prevalence depends only of a(t) and -y(t,x). Thus, the assumption that X(t,x) = ,B (t) essentially ensures that nontumor deaths act like sacrifices from the simpler model where the animal develops a tumor and at some later age dies. The only difference is that now the ages of death are not determined by the experimenter. Several methods for estimating these functions have been investigated (15)(16)(17)(18). A method of comparing the control and exposed groups for this model was proposed by Peto (3,9,15,19). The procedure consists essentially of applying of both the logrank and Hoel-Walburg tests to subsets of the data. Specifically, one first ignores all animals where death was due to the tumor, and applies a Hoel-Walburg test to the remainder. Next, one uses all the observations, treating deaths from the tumor as "uncensored" events, and all other deaths as "censored," and applies a logrank test. It can be shown that the former tests the equality of the control and exposed groups with respect to the tumor prevalence function wr(t), and the latter compares them with respect to the function rt rt h*(t) = fg(t,x) y(t,x)dx/{1 + fg(t,x)dx} An overall test for differences between a(t) or y(t,x) in the control and exposed groups is obtained by combining the p-values from the two procedures. Unlike the earlier tests that were discussed, Peto's approach requires no assumptions about tumor lethality, but instead requires accurate assignment of cause of death and assumes that (t) = A(t,x). We carried out an empirical study (6) of the recently completed EDO, experiment and found these assumptions to be violated in various degrees for several tumor types. Some analytic results suggest however, that the effects of such violations on the testing problem will be less severe than the distortions caused to estimates of prevalence. We feel that a better understanding of the way in which pathologists define cause of death can lead to improvements in the analysis of carcinogenicity experiments.

Defining Cause of Death to Avoid Distortion
Death is an end result of many complex events. While it may sometimes be evident that the tumor type in question contributed to death, the definition of cause of death is necessarily somewhat arbitrary. Recall that any rule for assigning cause of death defines a particular model of the form in Figure 3, where the hazard 8 is divided into two components: b(t,x) = A(t,x) + -y(tx,) As discussed in the previous section, the two-step Peto method is valid when ,B(t) = X(t,x) and cause of death is accurately classified. Consideration of the observed violations of these assumptions in the EDO, data raises the possibility that statisticians work with pathologists to redefine the notion of "cause-of-death" (perhaps a different phrase ought to be used) so that the condition ,B(t) = X(t,x) holds, regardless of whether the resulting definition has complete biologic meaning in terms of actual causation. If this could be done, we would be assured that exposed and control groups could be accurately compared with respect to 0(t) and h*(t). It remains to be seen whether an effective modification of the conventional notion of cause of death can be found.
Another approach, which we have begun to explore, is to move away from the notion of cause of death altogether, and to incorporate information on the anatomic stage of the tumor. This approach has the potential to simplify the task of pathologists and to solve the nonidentifiability problem without the need for restrictive assumptions.