Uncertainties in interspecies extrapolations of carcinogenicity.

The usual classification of results of animal carcinogenicity tests is positive or negative. Attempting to observe correlations between species using such results is complicated by differing test sensitivities. In these circumstances it is helpful to use models to represent the experimental data in a consistent way. Fitting the model parameters to the data allows computation of confidence limits and an assessment of concordance or discordance between different species in a way which accounts for differing test sensitivities. This paper describes this approach in detail for one class of models applied to the carcinogenicity test results of 187 of the NCI bioassay series, allowing comparison between B6C3F1 mice, Fischer 344 rats and Osborne Mendel rats. It is shown that the uncertainties in extrapolating between species are larger than generally acknowledged (a standard deviation of a factor of approximately 4.5), but that within these uncertainties there are few if any discordances.


Introduction
When a test for carcinogenicity is performed on groups of experimental animals, the usual way of reporting the result is qualitative -as a positive when a dosed group of animals has a sufficiently large excess tumor rate or a negative if any excess tumor rates are not sufficiently large. The criterion for sufficiency is usually statistical, while the negativity of any result is usually qualified by the experimental conditions, among which are the dose of material under test and the numbers of animals tested. Attempts have been made to observe correlations between species by using such qualitative results (1), the first step in verifying efforts at extrapolation between species, but they are complicated by the differing experimental conditions and hence differing sensitivities. Furthermore, the measures of correlation which may be obtained from qualitative results are limited, and the application of these measures to interspecies extrapolation is unhelpful, since such extrapolations are usually required to be quantitative.
Under these circumstances, the use of models is helpful in condensing the considerable experimental data into a few convenient, quantitative parameters which may then be used as surrogates for all the experimental data. The definitions of the model and its parameters may be chosen to incorporate such *Energy and Environmental Policy Center, Jefferson Physics Laboratory, Harvard University, Cambridge, MA 02138. biases as the existence of a background rate for tumors, the expectation that tumor incidence rates rise as dose of carcinogen is increased, and the expectation that tumor incidence is stochastic. If the model is then applied to a large set of experiments on different materials in various species, the values of the model parameters obtained in each case represent uniform measures of the experimental results in each case, and are thus ideal candidates for observing correlations and for use in extrapolation. Moreover, it is usually possible to obtain some sort of measure of uncertainty and significance for such parameters, allowing a better assessment of concordance or discordance between experimental results.
The purpose of this paper is to give some details of this approach by using a particular model and to show the results obtained when this model is applied to the data generated by the National Cancer Institute (NCI) Carcinogenesis Bioassay program.

Method
The NCI Carcinogenesis Bioassays constitute a large number of experiments performed with similar protocols, and thus are ideal for this approach. The model chosen (2) to fit the data generated is defined as follows.
The measure of dose chosen is defined to be the lifetime integrated mass of material ingested measured in bodyweights. (See Appendix A for the methods used for computing doses). For conve-nience this dose is expressed as an average dose rate by dividing by nominal lifetimes.
Tumor occurrence is a stochastic process occurring with a probability p related to the dose d defined above by p = 1 -(1 -a)exp{ -3d/(1 -a)} a 0 p where a and P are the parameters of this model, a representing the background tumor incidence while P is a measure of the carcinogenic strength of the material under test. P is referred to as the potency of the material.
In any group of N animals fed the test material at a (group average) dose d, tumor incidence r is a random variable with a binomial probability distribution given by PHr =() pr (1 p)Nr (2) where P(r) is the probability of observing r animals with tumors, and p is given by the expression (1) above.
By using this definition of the model, an estimate of the potency P and approximate confidence limits associated with the uncertainties due to numbers of animals tested may be found using maximum likelihood methods (Appendix B). In addition, the significance of this estimated potency can be calculated exactly under the assumptions of the model (Appendix C). This analysis has been repeated for almost every set of results reported in the summary tables set out in the NCI Carcinogenesis Bioassay Reports. Data from pooled or historic control groups were used in preference to data from the matched control groups, and time adjusted data were used preferentially whenever presented. Testicular tumors in male Fischer 344 rats were entirely omitted from the analysis.
In addition, similar calculations have been performed with a modified measure of dose, chosen so as to weight the doses given at each age with a weight factor corresponding to that expected in a linearised Doll-Armitage model (3) in which the carcinogen affects just one stage (Appendix D). The essential results obtained do not differ (for the parameters so far tested) from those found with the unweighted measure of dose described above, and are not presented separately.
In order to compare between species it is necessary to select amongst all the potency estimates obtained. The available selection criteria, together with the choices used in this paper, are: (A) chemical: whole database of 187 chemicals; (Bi) species: B6C3F1 mouse, Osborne Mendel rat, Fischer 344 rat; (B2) sex: male, female, either; (B3) tumor type/ site: any; (C) significance of potency estimate: various (see Table 1), no correction made for multiple comparisons; (D) size of potency estimate: largest.
(Bi), (B2) and (B3) were logically ANDed together. These criteria were applied in the order given to obtain a single value for each chemical tested. Confidence bounds were taken to be the lower confidence limit associated with the potency estimate selected and the highest upper confidence limit on all potency estimates satisfying (A) -(B3). Thus the upper confidence bound represents a measure of the sensitivity of the experiment. In most cases this upper confidence bound came from the same data that generated the selected potency estimate. By generating estimates of potency in this way with differing choices (of species, sex, tumor type/site) it is possible to compare them between these choices.
The purpose of this paper is to compare between species, so repeating the selections with differing species and all other selection criteria similar gives the required data. Each chemical may then be represented on a graph by a point with coordinates corresponding to the potency estimates obtained in the different species. Of course, for many chemicals there were no results significant in one or both species. Those with no significant result in either speable 1. cies can be discarded as providing no useful information, but those with just one significant result are potential exceptions to any relationship found between potency estimates in the two species compared. In these cases one can plot the significant result in one species versus the upper confidence bound found for all results in the other species, and check whether this disagrees with any apparent interspecies relations. The confidence limits shown in this paper are approximate 95% limits (90% confidence intervals).
There are sufficient data to present comparisons between B6C3F1 mice and Fischer 344 (F344) rats, and between B6C3F1 mice and Osborne Mendel (OM) rats. The comparisons extend over nonoverlapping sets of chemicals, since each chemical was tested in only one strain of rat.

Results
Since the usual method of extrapolating between species (as used in regulatory efforts aimed at estimating effects on humans) is to take the largest significant effect seen in one species (independent of sex), we present results first with no sex discrimination. Figures 1, 2 and 3 are plots of the logarithm (to base 10) of potency in B6C3F1 mice versus the same in F344 rats. The criteria used (B and C above) are shown as labels on the graphs. A total of 78 chemicals showed a significant (pO0.025, uncorrected for multiple comparisons) result in one or both of these species. The 37 significant in both are plotted in    Figure 1, a line corresponding to proportionality between species responses. Some statistics of these and later plots are shown in Table 1. Figure 1 is included to illustrate that confidence intervals calculated solely on the basis of sample sizes and binomial distributions do not explain the variance between results in different species, most of which must thus arise elsewhere. The distribution of points is clearer in Figure 2, where the confidence intervals have been removed. Clearly, any attempt to extrapolate between these two species must take into account the uncertainty indicated by this graph. The standard deviation about the regression line (Table 1) is 0.66, corresponding to a factor of 4.5. We have elsewhere suggested (4) how this may be taken into account.
On Figure 3 are plotted those cases which do not naively agree with the regression line. As can be seen, most appear consistent with a distribution leading to Figures 1 and 2, with only one (piperonyl sulfoxide, causing hepatocellular carcinoma in male mice) deviating substantially. Figures 4 and 5 show similar results for B6C3F1 mice versus OM rats. Once again, one can note that confidence intervals based on numbers of animals tested fail to cover the range of deviations between species, but that within these larger deviations there appear to be no obvious exceptional cases-or rather that the experiments as designed have not shown up any such cases. This process can be repeated for various selections. The general features of the resulting plots are similar to those displayed, and a few of the statistics generated are shown in Table 1. The numerical values shown in Table 1 correspond to the logarithmic plots displayed, and so refer to the logarithm (to base 10) of the ratios of potencies estimated in the species/strains in the left hand column. The values obtained are fairly stable against changes in the significance selection criterion, as may be seen by comparing the values given for p K 0.01, p < 0.025 and p K 0.05. All values agree within their estimated standard errors.

Discussion
The results illustrated here indicate that using this model, extrapolation of carcinogenic potency between rats and mice is uncertain, the standard deviation of the logarithm (to base 10) of the ratio of potencies measured in mice and rats amounting to about 0.65 (a factor of 4.5). Strictly speaking, the uncertainties found in Table 1 apply only to the doseresponse model as defined earlier in this paper, and then only for high doses. However, it seems unlikely that any other dose-response model is likely to do much better in this high dose regime, except possibly if it is highly nonlinear in the range of doses tested. This possibility is, of course, open to test in the way outlined in this paper.
As mentioned earlier, changing the measure of  dose by weighting dose rates with their time of administration has little effect on this measure of uncertainty. This result is not surprising, since many of the chemicals contributing to the large variance were fed to the animals at constant dose rates in these experiments, so that this form of weighting has little effect on the ratios of doses fed to the different animals, no matter how those doses are measured. Similarly, using a dose measure based on surface area (e.g., mg/m2-day) will not affect his measure of uncertainty, since such a dose measure is related to the one used (mg/kg-day) by a surface area to weight ratio which has much lower variance within a species. The only effect of changing to such a dose measure is to change the mean value of the interspecies potency ratio, represented by the position of the intercept of the dashed lines on the graphs shown.
The data of Table 1 are equivocal in their support for extrapolations between species solely on the basis of dose measured per unit surface area (practically, two-thirds power of body weight). Suppose that (, and P2 are potencies measured in species 1 and 2 on the basis of doses d, and d2 measured in mass per unit body weight, and that similar, primed, variables representing similar quantities are defined for doses measured in mass per unit surface area. Then (/P = dld' W-13, where W is body weight, and so 1'/12" = (11/12) (W, 1W2)-113. Using typical weights for the animals in these experiments gives the results shown in Table 2. As can be seen, extrapolation on a bodyweight basis appears best for the OM to B6C3F1 comparison, on a surface area basis for the F344 to B6C3F1 comparison, and neither is particularly good if all the data are combined for a single rat to mouse comparison. Although it has been shown (5) that an acute toxicity measure may be extrapolated between species by using measure of dose based on surface area (with an uncertainty substantially less than those ob-served in this paper), the rat appeared to be exceptional in giving biassed estimates using this procedure. We are left wondering if the rat is exceptional for carcinogenicity extrapolation also.

Appendix A: Dose Calculations
Dose Adninistered on Weight Basis For those experiments in which the material was administered by gavage, doses were specified and administered on a weight basis (mg/kg), usually by adjusting dose sizes according to average weights of groups of animals. Total lifetime dose was obtained by summing over all individual doses, and normalised to a dose rate (mg/kg-day) by dividing by nominal lifetimes, assumed to be 91 weeks for mice and 104 weeks for rats. Dose Administered as a Fraction of Diet Here doses were specified as a fraction (ppm or %) of the diet. To compute a lifetime integrated dose in body weights required the use of feeding curves for the experimental animals. Data on feeding of OM rats and B6C3F1 mice were provided in Carcinogenesis Bioassay Report No. TR2. From these data it was found that the cumulative intake of food measured in body weights (obtained by dividing weekly or four weekly food intake by the mean of the weights of the animals at the beginning and end of each week or four weeks, and summing from the beginning of the experiment) are well fitted by a curve of the form: F(t) = b(1 -exp{ -at}) + dt + Ct2 where F(t) is the cumulative intake in body weights at age t and a, b, c, d are parameters estimated from the data.
The parameters were estimated from data on groups of animals on control diets. To allow for weight differences between different groups of animals, and between OM and F344 rats, it was as-sumed that food intake is proportional to the twothirds power of body weight, so that food intake measured in body weights is proportional to the inverse one-third power of body weight. Thus the cumulative material intake due to a concentration C of material in the diet a fraction K of the time from ages t1 to t2 could be expressed as: KCM-1"3[F(t2) -F(t1)] for an animal of full grown weight M. Scaling the parameters a, b, c, d allowed this expression to directly give the average dose rate (mg/kg-day), using nominal lifetimes of 91 weeks for mice, 104 weeks for rats. Full grown weight was estimated by eye from the growth curves of each experiment, and corresponded to weight at 60-70 weeks of age. The fraction K was used when dosing was only carried out for some fraction of the week. Adding up expressions like this over the concentration-age schedule for the experiment gave the total average dose rate.
The values of the parameters used (with age measured in weeks, concentration in ppm, and weight in grams) are given in Table A  Dose Administered in Water. A simple approach was taken, scaling water intake with full grown body weight and normalizing to standard animals. Fitting the scaling law to standard mouse, rat, dog and man gave water intake as: T = 0.3629 W-0-2125 where T is daily water intake in body weights and W is body weight in grams. This relation was assumed to hold for all ages, so that simply multiplying the time integrated concentration of material in water (in ppm-days) and dividing by nominal lifetimes (in days) gave average dose rates in mg/kgday.

Appendix C: Significance of Potency Estimate
The significance of the potency estimate is the probability of observing a potency estimate as high or higher than the one actually observed, under the null hypothesis that the true value of potency is zero, or equivalently that all groups of experimental animals are samples from the same population. For any experimental design represented by: Hence the probability of observing an estimated potency as large as actually seen is just the probability of getting the tumorous animals distributed into the k groups in such a way that the potency estimate is as large or larger than , that is: This form of weighting factor was used to determine a dose variable, where the dose rate was measured in bodyweights per day and computed as in Appendix A. The integration was performed by summing values computed daily. This work was performed under DOE contract number DE-