Use of screening tests to assess cancer risk and to estimate the risk of adult T-cell leukemia/lymphoma.

We developed methods to assess the cancer risks by screening tests. These methods estimate the size of the high risk group adjusted for the characteristics of screening tests and estimate the incidence rates of cancer among the high risk group adjusted for the characteristics of the tests. A method was also developed for selecting the cut-off point of a screening test. Finally, the methods were applied to estimate the risk of the adult T-cell leukemia/lymphoma.


Introduction
The characteristics of a screening test are usually expressed by sensitivity and specificity. The sensitivity (specificity) of a test is the probability that a person having (not having) the disease is correctly classified.
The prevalence rate of cancer is frequently estimated through the use of a screening test. The effects of the characteristics of the test in estimating disease rates have been investigated and methods are proposed for adjusting the disease rates (1,2) with the characteristics of the test assumed to be known. Methods of estimating the characteristics of the test have also been developed by several authors. Tanenbein (3) considered the use of a double sampling scheme. Hochberg (4) extended this to provide a basis for inference from general multidimensional contingency tables. Goldberg and Wittes (5) applied the capture-recapture model. Hui and Walter (6) suggested that data are available from any two populations with different prevalences and applied the method of maximum likelihood in estimating simultaneously the characteristics of the test and the prevalence rates in both populations. Yanagawa et al. (7) and Yanagawa and Kasagi (8) introduced a study design in which a person is tested repeatedly at least three different times, and they developed a method of estimating prevalence and incidence rates of disease together with the characteristics of a test. Recently the variance of an estimator of the test characteristics was studied (9).
In this paper we consider cancers such as those caused by virus, genetic factors, and the like. The group that is infected by the virus or that inherits the genetic factors is called the high risk group. We used the situation where a screening test is employed to judge whether or not a person belongs to the high risk group. Also we determine the situation where a person is diagnosed positive (negative), if and only if the response of the test exceeds (does not exceed) the predetermined value. The value is called the cut-off point.
We first proposed a method to determine the cut-off point of the screening test. The point is determined in such a way to estimate the characteristics of the test and select the point that minimizes the estimated bias in estimating the prevalence rate. Next we developed a method of estimating the size of the high risk group in a population that adjusts for the characteristics of the screening test. The method is an extension of that proposed in Yanagawa and Tokudome (10). We next developed a method of estimating the incidence rates of cancer among the high risk group. The method adjusts for the characteristics of the screening test. There are extensive studies on this topic (11)(12)(13)(14)(15)(16), but they do not discuss the characteristics of screening tests. These studies assume two binomial variates. An excellent review of these methods are given by Gart and Num (16). On the other hand, the estimation of the risk ratio is developed by Gart (17), assuming two Poisson variates. We applied the Poisson approximation in developing our estimates. Finally, the methods were applied in estimating the risk of the adult T-cell leukemia/lymphoma (ATLL) in Saga prefecture in Japan.

Designing a New Screening Test and Estimating Its Characteristics
We used two tests to design a new screening test and estimate its characteristics: test 1 was an expensive and sophisticated test that is used in laboratory as a standard test; test 2 was a simple and quick test, newly developed for the purpose of mass screening. A person is diagnosed positive (negative) if and only if the response of the test exceeds (does not exceed) the value of the predetermined cut-off point.
We first considered the problem of deciding the cutoff point of test 2. Usually the cut-off point is decided by testing a person by both tests, summarizing the data in a table like that in Table 1, and then selecting the point among c1, C2,..., Cr+1 (c1 < C2 <. . . < Cr+1) in the table that provides test 2 with the highest association with test 1. How could this be justified? Note that, although test 1 is more reliable than test 2, it could be also a fallible test. Ideally, the cut-off point of test 2 should be decided by examining its sensitivity and specificity. The sensitivity and specificity mutually compete, and one cannot minimize them simultaneously. To circumvent this, we proposed to select the cut-off point among cl, C2, .. ., Cr+l that provided the minimum relative bias in estimating the prevalence rate. Note that a and p are called false positive and false negative rates, respectively. We introduce the corresponding quantities a, and Pi for test 2 by: ai = pr{T2 ci c D = 0} Pi = pr{T2 < ci D = 1} since cl < c2 <... < cr+, ai and P,i are restricted by: where P0 stands for the true prevalence rate. Therefore, the relative bias which will be introduced by test 2 with cut-off point ci is: Note that the usual method does not necessarily lead to the cut-off point that minimize this bias, even if test 1 is the perfect test (a = i = 0), since impact of a is stronger than P in estimating the prevalence rate (2). Furthermore, note that if PO is small, the a and P that minimize Eq. (3) also minimize the mean square errors when the frequency is employed in estimating PO.
We need estimates of PO, a, P and ai and Pi, i = 1, 2,..., r + 1 to assess the bias given in Equation (3). We now describe it by considering the data expressed in Table 1. The total number of the parameters involved in it is 3 + 2(r + 1) = 2r + 5, but there is only 2r + (1 -is an estimate of the size of the high risk group adjusted for the characteristics of the test and Za is the upper (xi) 100(alpha/2)% point ofthe standard normal distribution.
' From this we have an approximate 100(1 -alpha)% confidence limits of the size of the high risk group in population H1 adjusted for the characteristics of the screening test as follows: Estimating Size of the High Risk Group in a Population by a Screening Test Two questions frequently arise: How large is the size of the high risk group of the cancer in a population? and What is the incidence rate of the cancer in the high risk group? In this section we consider the first question; the second question will be discussed in the next section.
Let HI be the target population of size N and let Q be a subpopulation that consists of a random sample of size n from HI. Suppose that the screening test, with specificity 1 -a and sensitivity 1 -P, is undertaken to all subjects in Q to identify the high risk group, aiming to use the information for estimating the size of the high risk group in I. Let X be the size of the high risk group in Q, which is identified by the screening test. Suppose that X follows binomial distribution with size n and parameter P.
Let Y be the size of the true high risk group in 1H -Q; suppose that Y follows binomial distribution of size N n and parameter P0. Here P0 is the true probability of a person belonging to the high risk group in II, and P is the probability of a person diagnosed to be positive by the screening test. The latter may be called apparent rate; the term true is used to contrast it. Put T (Xln) -a }+ Note that T represents the size of the true high risk group in I. We suppose that X and Y are mutually independent. The constant A that satisfies When a = , = 0, the confidence limits agree with that considered in Yanagawa and Tokudome (10).

Estimating Incidence Rates of Cancer in the High Risk Group
We consider the second problem, that is, the problem of estimating the risk ratio p,u = rr IP0, where ff is the incidence rate of the cancer in HI, and P0 is the true probability of a person belongs to the high risk group of the cancer.
Confidence Interval for j,u Let P again be the probability of a person diagnosed to be positive by the screening test. Let X and Z be independent binomial variates based on sample sizes n and N and parameters P and wr, respectively. We consider the confidence interval of L,, when nP and NIT are not large in a situation where the Poisson approximation to the binomial distribution is appropriate. The approximation has been employed in Gart (17) when no test characteristics are taken into account.
Using the relationship in Eq. (2), the log likelihood of x and z is obtained as:  (6) [{N n n(lat-0p)2 P] where u is the estimated cancer incidence rate in HI, and Za is as defined in Eq. (4), and P = x/n. where -a) (4) If the value of P is small, say 0.05, then Eq. (6) shows that a should be extremely small, otherwise the confidence limits inflate and estimation ofthe risk ratio looses its validity. Note that Rogan and Gladen (1) indicate that the screening test with a = 0.05 could be an extremely good test in practice. given by the following procedure:

Summary Risk Ratio
Step 1: Give an initial estimate of po.
'o = L NiPoi Step 4: Repeat Step 1-Step 3 until convergence. An approximate 100(1-alpha)% confidence limit of Ro is given by: Application to Estimating Risk of ATLL The adult T-cell leukemia/lymphoma (ATLL) is the disease proposed recently (18). The disease is caused by retrovirus termed human T-lymphotropic virus type I (HTLV-I). The disease is endemic in Japan and is contracted from HTLV-I carriers. The virus may be transmitted from mother to child and between husband and wife. Further studies have shown that HTLV-I can be transmitted by blood transfusion. The authors were involved in the project of estimating the risk of ATLL in Saga prefecture in Kyushu. The high risk group of the malignant neoplasm in this study is the one with HTLV-I carriers. The cases of ATLL were obtained from the cancer registry and also from related records in the prefecture. The number of the carriers is estimated by the examination of the sera of blood donors. It is important to study the bias that might have been introduced by the use of donor's blood, but we could not quantify it for various practical reasons. The epidemiologic results of the whole study are published elsewhere (19).

Selecting the Cut-off Point
The anti-adult T-cell leukemia/lymphoma-associated antigen (ATLA) of serum has been examined in the laboratory by an indirect immunofluorescence (IF) test using acetone-fixed MT-1 cells on slides as ATLA (20). When IF dilution of serum of 1:5 or higher gives a positive immunofluorescence, the serum is judged to be IgG anti-ATLA-positive. A simple and sensitive gelatin particle agglutination (PA) procedure has been developed for mass screening of the antibody to HTLV-I in human sera. The procedure in the PA test is much simpler than that in the IF test or in enzyme-linked immunosorbent assays (ELISA) because it involves only a one-step reaction of antigen and antibody, whereas more steps are required in the other tests. The titer of the antibody in the PA test is expressed as the reciprocal ofthe greatest serum dilution that gives positive reactions.
Several authors (21) suggest selecting 24 as the cutoff point of the titer in the PA test, namely to judge the final serum dilution of 1:24 or higher showing agglutination as positive. We examine this cut-off point based on the data in Table 2, which are given by Kamihira et al. (22). The data have been collected in a different sampling scheme, but the results in previous papers (21,23) indicate that we can consider the table as an example of Table 1. Table 2 lists the results of 19,816 sera tested by both tests. The two estimates of the parameters from Table 2 that were obtained by assuming the perfection of the IF test and the restriction previously described above, respectively, are shown to be identical in these data; these estimates are listed in Table 8 together with the relative bias obtained from Eq. (3). The table shows that when 24 is selected as the cut-off point, the bias is 0.123; when 26 is selected, the bias is 0.007, which is the smallest among the four.    Table 4 shows the data and the estimates of the number of carriers and its confidence intervals. The data are collected through the PA test with the cut-off point 26, and the estimates are obtained by assuming that a = 0.00053 and p = 0.01784 are known constants. To see the impact of the characteristics of the test further, we obtained the estimates of the size of the carriers and its confidence intervals for several values of a and p in females based on the data in Table 4. These estimates are summarized in Table 5. The table shows substantial effect of the characteristics; in particular, the impact on the younger age group is remarkable.  Estimating Incidences of ATLL among Carriers Table 6 lists the number of cases of ATLL for 3 years and also shows the population in Saga prefecture between the ages of 30 to 60. No cases of ATLL are reported in ages 20 to 40 in males and age 20 to 30 in females. Table 6 also lists the estimates of the incidence rates of ATLL among the HTLV-I carriers in males and females and its summary rates. The estimates are obtained by the method developed in the text. It was supposed that a = 0.00053, p = 0.01784 are known constants. To study the impact of the characteristics of the test, estimates were obtained for several values of a and , in females based on the data in Tables 4 and 6.

Estimating Number of Carriers
The estimates are summarized in Table 7. The table shows substantial impact for large values of a. Since there is no incidence of ATLL for the group ages 20 to 30 on which the impact is the greatest, as was discussed previously, the table shows that the impact of the characteristics of the screening test on the summary risk ratio is not so strong as the estimation of the size of the carriers. Note that if the values of a exceed 0.032, the Table 7. Estimated incidence rates of ATLL among the carriers in females for selected values of a and e based on the data in summary risk ratio in age group 30 to 60 is not obtainable because of the strong impact, in particular, on age 30 to 40.