Statistical evaluation of factors influencing prognosis of gastric cancer patients: Predication of prognosis on patient clusters.

We found ten clusters of gastric cancer patients in Imanaga's group under a cancer research project organized by the Ministry of Health and Welfare, and evaluated the prediction of prognosis of those patients in each cluster by using the censored regression of postsurgical survival time on a "prognosic" factor which has been extracted from nine explanatory variables observed mainly at the time of surgery. Consequently, the ten clusters were interpreted and confirmed to be useful for prediction of the patient prognosis by comparison of the failure rates among those clusters and between treated (administration of chemotherapy) group and control group.


Introduction
The survival time is one of the leading criteria necessary for the evaluation of treatments administered, to cancer patients. Especially in clinical trials in cancer therapy, the survival time expressed in terms ofmonths has been used as a measure to assess the performance of the therapy. Variables or factors which affect or explain the survival time are called prognostic variables or prognostic factors, respectively. In an ideal clinical trial, it is desirable that prognostic factors be selected prior to random assignment of each treatment to be compared and then evaluation of the results be given by the aid of observed survival time expressed in months, but prognostic factors conceivable beforehand are so numerous that it is difficult to specify the factors in advance. It is merely in the last decade that the *Shionogi Kaiseki Center, 22-41, Ichichome, Izumicho, Suita City, Osaka, Japan.
tAichi Cancer Center Hospital, 81-1159, Tashiro-cho, Kanokoden, Chikusaku, Nagoya, Japan. Imanaga's group under a cancer research project organized by the Ministry of Health and Welfare has carried out a survey as a part of the project for evaluating cancer chemotherapy on patients with stomach cancer (1). The cases are documented in four studies; the first one starting on January 1965, the second on June 1966, the third study starting on March 1969, and the fourth on January 1971. The survey is still in progress. Under these circumstances, some attempts have been made since 1964, utilizing the data obtained, to determine prognostic factors of chemotherapy on stomach cancer patients having undergone surgery. Focusing our attention on those in the first study with their records of long follow-up period, the data have been analyzed in order to examine three subjects.
The first is to predict postsurgical prognosis of the patients who suffered from gastric cancer based on those factors which can be clarified at the time of surgery. Generally, survival time relates survival time to the prognostic factors on a previously assumed model and then extracts the individual October 1979 characteristics for patients concerned with postsurgical survival time and the prognostic factors. Therefore two problems may arise from evaluation of the model: the evaluation of the possibility of prediction of prognosis and the prediction of prognosis. Careful discrimination of the two is important.
The second subject of study is to clarify the appropriateness of the time period, usually taken as about five years after surgery, that has been empirically said to be the time point which divides a "good" prognosis from a "bad" prognosis. The evaluation of prognosis should be made also in connection with prognostic factors.
The third study subject is an evaluation of the effect of adjuvant chemotherapy, the primary purpose ofthe project, in connection with the prognostic factors.
The analytic methods for these approaches have also been improved and extended several times in the course of the analysis.
With respect to the first purpose, based upon the cases collected through the first survey in the project, one prognostic factor was extracted out of nine factors clarified at the time of surgery and then related to the postsurgical survival time in months. The results suggested that critical point to distinguish those dying shortly after surgery from long-term survivors can be placed around 5 years after the surgery (1). With respect to the second purpose, the same cases as stated above were classified into four groups, depending on their survival or death at a point five years after surgery with or without postsurgical chemotherapy. Comparison of characteristics and canonical analysis among the groups indicated that the discrimination between the dead and the survivors at five years after surgery could be assessed with a probability of misclassification of 24.7%. With respect to the third purpose, the relatively large contribution of the presence or absence of serosal invasion to prognosis in stomach cancer was proved based on the canonical analysis. Postsurgical chemotherapy given;at s1 also prolonged the survival time significantly by about 34 months.
Difficulties in such an analysis of the data, first of all, are that when a number of fragmental pieces of information are taken from a specific patient with an incurable disease to characterize the survival time of patients taking them as explanatory variables, there arise two problems: that of over-fitting, in which the number of explanatory variables becomes excessive as compared with the sample size, and that of near singularity due to the existence of high correlation between the explanatory variables, which require a reduction in dimension of the data. In our previous report (1), the nine factors clarified at surgery was reduced to one prognostic factor, and survival-time analysis was carried out based on the prognostic factors. Next, the existence of censored observations cause another difficulty, because the censored observations are not avoidable in an actual study of the survivorship of a group of patients. Although the censored observations did not appreciably affect the results related to the evaluation of prognosis mentioned above, censored adjustment is necessary for carrying out prediction of prognosis, especially prediction of postsurgical survival time by the aid of regression on the prognostic factor.
Though, the results given in the previous report (1) are valid only as a statistically average meaning of overall cases collected in the first study, they are insufficient to predict postsurgical survival time for any individual patient in the group. Namely, in order to predict survival time for an individual patient, the patient population should first be divided into some subgroups so that the prediction may be applicable to each divisive subgroup.
In the present report, at first, cluster analysis was performed in the cases entered in the first study, identical to those subjected to the previous report (1), and the prediction of prognosis was evaluated on each cluster. Secondly, the censored adjustment was added to the simple regression of survival time on the prognostic factors presented in the previous report (1) in order to predict prognosis. The distinction between the evaluation of prediction of prognosis and the prediction of prognosis itself was automatically clarified through the course of our analyses. Prediction for survival time expressed in month was given from the regression of the time on each cluster. Characteristics of patients were examined in two clusters, one in which a marked effect of chemotherapy was observed and the other with little effect of chemotherapy.

Clinical Cases
In the same manner as described in our previous report, 430 cases were chosen from 509 cases (237 cases receiving chemotherapy, and 270 cases used as controls), which were collected and examined by Imanaga and his co-workers in W-I cooperative study group sponsored by the Japan Ministry of Health and Welfare during 10 years from January 1965 to December 1974. Those who died either during operation or from causes apparently other than cancer were excluded. The remaining 430 patients were confirmed as stomach cancer by pathological diagnosis, satisfied the presurgical conditions, were subjected to curative resection of gastric cancer. Records of sex, age, and tumor descriptions were Environmental Health Perspectives provided according to the rules proposed by the Stomach Cancer Society, including extent of resection, location of tumor, post-surgical complications, tumor size, metastasis to lymph nodes, degree of serosal invasion, and classification by Borrmann type. To evaluate the validity of our prediction of prognosis, the prognosis predicted was compared with survival to December 1976.

Classification and Identification of Patients
The frequency distribution of patients varied depending on the prognostic factors extracted from the nine factors observed mainly at the time of surgery. Without further divisive classification of patient population, neither the average trends of survival time nor the prognostic factor described in our previous report can predict survival time for any patient with stomach cancer. The divisive subclasses should be mutually exclusive, while the variability of observations is homogeneous within the subclass. Commonly, classification of mass of individuals into a given "natural" group is called "dissection of mass," which is one of some methods of cluster analysis. During the course of extraction of prognostic factors in the manner described in our previous report (1), sex and age were excluded from the prognostic factors because these are not heavily related to prognosis. The remaining seven factors, i.e., extent of resection, location of tumor, postsurgical complications, tumor size, metastasis to lymph node, degree of serosal invasion and classification in Borrmann type, were used for the cluster analysis. Here we used the Ward technique in the cluster analysis. Thus, the patient population was divided into subclasses according to similarity of patients measured by Euclidian distances on seven factors mentioned above.
In the present study, in order to exclude possible variation due to institutional difference, the patients were at first divided into three groups such as the Aichi Cancer Center group, the Cancer Research Institute Hospital group, and the group of patients from other institutes. Then, the cluster analysis was carried out for each group. Figure 1 shows a dendrogram of clustering given to 137 cases collected from the Aichi Cancer Center Hospital during the first W-I study. All the cases were divided into 10 clusters from G1 to G1o, depending on their characteristics the cluster on seven axes representing seven categories stated above. The dendrogram shows a sequence ofdividing orjoining ofclusters. The clusters with similar characteristics were combined step by step. For example, at first G2 and Glo were combined to form G(2, 10) and similarly G(, 5) was formed from GI and G5, G(4, 6), from G4 and G6, and G(3 9), from G3 and Gg. Furthermore, G(2, 1o) and G(L 5) were combined to form G(2, 10) (1 5)-The average survival time was estimated for each cluster and is given together with the maximum and the minimum values under the corresponding radar chart in Figure 1. With clusters of G4 and G6 characterized by longer survival time, G2 and G,o characterized by shorter survival time, and G5 and G8 which submitted to subsequent evaluation for the chemotherapy, the clinical records for 11 variables are tabulated in Table 1. Patients contained in cluster G4 are in moderately advanced stage and characterized as follows: extent of resection, pylorus; location of tumor, mostly M; complications, none; size of tumor, mostly below 5; metastasis to lymph nodes, mostly none; degree of serosal invasion, mostly s; Borrmann types, 1 -2 type. Patients in the G6 cluster are also characterized as similar to cluster G4, except that location of tumor was A and that serosal invasion was essentially so. Tumors in patients in G6 were rather less advanced than those in G4. Characteristics of each cluster are outlined in Table 2. Similarly, 10 clusters were produced from either     Table 2, the five-year survival or the values (Fig. 2). The reification of the cluster analyses five-year mortality was calculated for each cluster like Table 2 is omitted from the present report for and these values are given in Table 3 and Figure 3.
these two groups. Institutional differences of "anat- The survival rates given in Table 3 were estimated by omy" of the cluster was observed to some extent, making use ofthe method of Kaplan and Meier (2) for these clusters. [Gehan (3) suggested that in general a popular life table method is not applicable for estimation of either observed survival rate or relative survival rate of a group when the population size is less than 50.] With respect to clusters G5 and G8, the survival rate or the death rate of the group receiving chemotherapy was different from that of the control group. It is interesting to note that five-year survival or five-year mortality in the chemotherapy group is different from that in controls among those with stomach cancer advanced slightly more than moderate degree. There was no significant difference in five-year survival or five-year mortality between groups receiving chemotherapy and control groups in clusters G4 and G6 which were characterized by long survival time after medication and in cluster G10 which characterized by death shortly after surgery. This evidence supports our previous report (1) on the effect of postoperative chemotherapy.

Prediction of Prognosis
In the previous report (1), one prognostic factor was extracted from nine factors at the time of surgery, and the effect of the prognostic factor on the postoperative survival time was discussed. The average survival time was 67.2 months in the control group in a patient population collected by the first W-I study with observation period till December 1974, and this value was compared to the prognostic Based on the data for cases collected by the first W-I study, a regression line with censored adjustment is given in Figure 4 together with an ordinary regression line as reported in the previous report (1). The ellipses in the figure are those of 95% critical regions for two groups of short-time death and longtime survival, which were classified by the abovestated period of 67.2 months for the survival time after surgery. The ordinary regression line is drawn to divide the two groups, while the regression line with the censored adjustment passes in the middle of the long-term survivors, indicating that survival time may be predicted by making use of the regression line with the censored adjustment. In order to compare the ordinary regression analysis and the regression analysis with the censored adjustment, survival time was predicted for those alive in December 1974 and the values were compared with survival time observed by survey carried out in December 1976. The results are shown in Figure 5. The evaluation of the predictions was expressed by deviation of the predicted survival time from observed value. The values of survival time predicted by the ordinary regression analysis deviated from the observed values exclusively towards the negative side, indicating underestimation, while those predicted by regression analysis with the censored adjustment deviate to both sides, reducing deviation as a whole. Thus, the regression analysis with the censored adjustment compares favorably with the ordinary regression analysis as far as survival time predicted for those alive in December 1974 is concerned.
The average survival time was estimated by the regression analysis with the censored adjustment for control groups of each cluster of the Aichi Cancer Center group. The results are given in Table 4. The results indicate the evaluation of the interpolation regression analysis in each cluster. The censored adjustment of regression improved the predictability of survival time in such clusters characterized by a long survival time (G6, G9), while the results are not very satisfactory for shorter survivors (G5, G6). The censored adjustment many be effective for the survival time study based on the observation for a limited period.  Some Comments on Data Analysis Similar to the previous report (1), based on the data for cases collected during the first W-I study under a project sponsored by the Ministry of Health and Welfare, divisive grouping of patients according to their characteristics and prediction of their prognosis was examined. The major subject of a survival time study is in general the survival time of patients, and problems which we should consider are how to understand and to characterize the survival time. In the present report, various analyses were conducted in order to examine three subjects mentioned in the introduction. Survival time distribution should be discussed prior to the analyses. Weibull distribution, gamma distribution, or log-normal distribution may theoretically fit the survival time distribution, although the present survival time data have shown strong conformity with none of these distributions.
Therefore survival time was used in the regression analysis without any data transformation. The adjustment for the data to one of the standard distributions remained to be studied. Although relative survival rate and observed survival rate, both of which are presented in a form of life table, are commonly used as one of measurements to estimate prognosis ofstomach cancer, they are nonparametric estimates exclusively applicable for a group of large sample size. For a group of small sample size, survival rate estimated by the method of Kaplan and Meier (2), as described in this report, seems to be fit well for a nonparametric estimate. Cluster analysis, as described in this report, is an expedient means in which patient population is divided into subgroups as naturally as possible based on the similarity of a given characteristic. Consequently reproducibility of the results depends solely on the method of dividing into clusters and the scale of similarity. The present cluster analysis, i.e., Ward technique, is merely an attempt to utilize the general tendency reported in the previous report (1) to predict prognosis for individual patients. Some informations obtained in the process are of practical importance. For example, the anatomy of each cluster listed in Table 2 and prognosis records given in Figure 5 may serve for prediction of individual prognosis on the characteristics of the patient.
It seems necessary to develop a mixed algorithm in which the patient population is first divided into subgroups by nonhierachical clustering, followed by hierarchical clustering. Censored adjustment of regression with incomplete observation is also necessary as far as prediction is concerned. The censored adjustment is applicable not only to single regression as stated in this report, but also to multiple regression. For example, in order to make categorical regression analysis as described in the previous report [4] to a patient population including those still alive at the point of survey, the censored adjustment was applied as shown in Figure 6. The weights assigned to individual characteristics appear to be sufficiently reasonable except for one assigned to postsurgery complication.

Conclusion
Prediction of prognosis and classification of a patient population into homogeneous subgroups were examined in the present report as a part of study on prognostic factor in patients receiving surgery for stomach cancer. Special attention was paid to incomplete observations specific to the survival time study. It was proved that censored adjustment of regression is necessary in prediction of prognosis. In addition, cluster analysis was proved to be useful for prediction of patient prognosis by the aid of a statistical rule which primarily was reified in the practical clinical field.