Review of the methods of quantification.

In order to analyze qualitative observations, methods of quantification or optimal scaling have been proposed by Fisher, Guttman, and Hayashi. According to these methods, scores are assigned optimally in some objective and operational sense to the qualitative categories. The present paper mainly reviews Hayashi's four methods of quantification from the mathematical point of view. They are widely used, especially in Japan, in various fields such as social and marketing surveys, psychological research, medical research, etc., where information is obtained mainly in the form of qualitative categories. The first and second methods are applied to the case where an external criterion is present, and are used to predict the external criterion or to analyze the effects of factors. On the other hand, the third and fourth methods are applied to the case where no external criterion is present, and are used to construct a spatial configuration so as to grasp the mutual relationship of the data. After reviewing Hayashi's four methods, we discuss two topics which have been pointed out as the problems to be solved in applying the methods of quantification. One is quantification for ordered categories and the other is statistical consideration. With respect to these topics we review some recently developed methods including the studies due to the present author. Finally we mention briefly several computer programs available in Japan.


Introduction
In some experimental and observational studies, the responses and/or attributes of subjects are measured only by qualitative categories. In order to analyze such observations, methods of quantification or optimal scaling have been proposed by, among others, Fisher (1), Guttman (2), and Hayashi (3)(4)(5)(6)(7)(8)(9). According to these methods, scores are assigned optimally in some objective and operational sense to these qualitative categories. In Japan, Hayashi's methods of quantification are well known and widely used in various fields, such as social and marketing surveys, psychological research, and medical research, where information is obtained mainly in the form of qualitative categories.
The main purpose of the present paper is to review Hayashi's four methods of quantification. They are explained mainly from the mathematical point of view. Then, in addition, we focus on two topics, which have been pointed out as the problems to be solved in using the methods of quantification: the methods of quantification for ordered categories and the statistical considerations. Finally we mention briefly several computer programs available in Japan.
Hayashi's Four Methods of Quantification Among various methods proposed by Hayashi (3)(4)(5)(6)(7)(8)(9), especially the four methods shown in Table 1 are widely applied in Japan and called simply as Hayashi's first-fourth methods of quantification. As shown in Table 1, they are divided into two main classes. One contains the methods for the case where an external criterion is present and is used to predict the external criterion or to analyze the effects of factors. The other contains the methods for the case where no external criterion is present, and is used to construct a spatial configuration so as to grasp the mutual relationships of the data. The "external criterion", which is also called "outside variable", means something to be predicted or explained.
First Method of Quantification (Quantification I) The first method of quantification is a method to predict the quantitative external criterion or criter-October 1979

Situation
Observation Method Case with an external criterion (for predic-The external criterion is observed quantita-First method (to maximize the correlation tion or analyzing the effects of factors) tively coefficient) The external criterion is observed qualita-Second method (to maximize the correlation tively ratio) Case with no external criterion (for classifi-Response patterns of subjects on some attri-Third method (to maximize the correlation cation or constructing a spatial configurabutes are given coefficient between subjects and categories) tion) Similaities between pairs of subjects are ob-Fourth method (to maximize the objective served quantitatively fuinction [Eq. (53)] ion variable on the basis of the information concerning the qualitative attributes of each subject and to analyze the influence of each attribute to the criterion variable. The data for this method are usually given in the form of Table 2. Let Y be the quantitative external criterion, and let us suppose that every subject under study can be classified into one and only one ofCi categories of the i-th attribute item for i = 1, 2, ..., I. Dummy variables are introduced such that j1, if subject a belongs to categoryj of the xa(ij) = i-th attribute item, i = 1,2, .. ., I 0, otherwise, (1) for subject a, a = 1, 2, . . ., n. In order to analyze the relationships between the external criterion and the qualitative attributes we shall assign a quantity or numerical score su to categoryj of the i-th item, and as a result assign a score to subject a. The principle for quantification is to maximize the sample correlation coefficient between {Y,} and {Y(C, i.e., The basic idea is to predict the external criterion as accurately as possible on the basis of a linear combination, Eq. (3). Due to the fact that the principle of maximizing the correlation coefficient is equivalent to that of minimizing the mean-square error (10), we obtain the normal equation (5) Environmental Health Perspectives The optimal scores for the categories of the attributes are obtained by solving the linear simultaneous equation (5). Then, using the optimal scores {sij}, each qualitative attribute is quantified by Eq. (2) and the external criterion Y can be predicted by Eq. (3). It may be considered that the efficiency of quantification is high when the multiple correlation coefficient or R2 = Pmax2 is large, but it is low when R2 is small. The contribution of the i-th attribute to the external criterion is measured by the partial correlation coefficient r [Y W(i); W(1), . ,W(i -1), W(i + 1),.., W(I) (7) or approximately, by the range ofthe assigned scores Ri = maxj Su, minj Su (8) In actual data analysis, partial correlation coefficients and/or ranges are often represented graphically for the convenience to find important attributes or factors.
No probabilistic model is assumed in the first method of quantification. However, if the problem is recognized as the multiple regression of the external criterion {Y,a} on the dummy variables {x<, (ij)} such that Ya Y. ; Oij xz (ij) + e(xa ii a = 1,2, ..., m (9) and if the normality of the error term ea can be assumed, the statistical properties of the scores or estimates su for Oij are derived by the theory of regression, and the contribution of each attribute can be tested exactly by using the ordinary significance test of regression coefficients.
In order to analyze the relationships between the external criterion and the qualitative attributes we shall assign a numerical score Skl to category 1 of the k-th item as in the case of the first method of quantification, and as a result assign a score c (k) Wij(k) = S SklXiJ(kl) (11) to qualitative attribute k ofthej-th subject in 7i, and a score Y(c=ij Wi, 1) + WJ3(2) + * + Wij(I) = I 2Skl XiJ(kl) (12) k I to thej-th subject in iT,. The principle of quantification is to maximize the sample correlation ratio or the between-groups variation relative to the total variation, i.e., R2= SB/ST --max. The second method of quantification is a method to predict the qualitative external criterion on the basis of the information concerning the qualitative attributes of each subject and to analyze the influence of each attribute to the discrimination of the external criterion. The data for this method are usually given in the form of Table 3. It is formulated in the following two ways.
Formulation Based on Canonical Analysis. We suppose there exists an external criterion with r categories or groups 7ri, 1T2, . . ., ir, and introduce the following dummy variables: 1, if thej-th subject in rii belongs to xu (kl) = category I of the k-th attribute O, otherwise, k = 1, 2, . . ., I and is transformed to the eigenvalue problem (16) Then without any loss of generality we may exclude each dummy variable for any arbitrary category per item and the corresponding rows and columns of the matrices B and T. It is just the same to assign zero scores to such categories. After solving (B -r2T) s = 0 (19) where the matrices with the tilde indicate the abbreviated matrices, we may normalize the location to satisfy the relation E n'kl Skl = 0 k = 1, 2,I (20) if necessary. The number of nonzero eigenvalues is generally given by min [r -1, I (ci -1)] except for degenerated cases. i The optimization problem (16) is interpreted as the application of canonical analysis for dummy variables {xij (kl)}.  (21) where ni denotes the sample size of mr. {Za(),i = 1, 2, .. ., r} and{xa(kl), k = 1, 2, ...,I, I = 1, 2, . . ., Ck}, i.e., r2(W(C), Y(,)) -* max. (26) Let us use the matrix notations such that Sii S12 S= S21 S22 where S is the sample variance-covariance matrix of the dummy variables {za(i)} and {xa(kl)}, Sil = The r x r matrix with [ni8ij -(ninjln)]/n as its (ij) element S12 = the r X I Ck matrix with [gi(kl) -(nin 'kl)ln]ln as k its (i,kl) element S22 = the I ckx I Ck Then the above principle is expressed as (t' Si it) (s' S22s) As noted previously there exist linear dependencies among the dummy variables, we may exclude each dummy variable for an arbitrary category per item and the corresponding rows and columns of the matrix Sip, i, j = 1, 2. Denoting such abbreviated matrices with the superimposed (-), we obtain 2 = 12 max.
(28) p-( 1 S ) (VS S22 Hence, due to the theory of canonical correlation analysis, the optimal scores satisfying (28) are given by solving the simultaneous equations such as -p pS11 + S12 s = 0 which are transformed into the following two types of eigenvalue problems with the common eigenvalues.
(S12 S221 S21 -p2 Sll)I = 0 (30) (S21 S11i1 S12 -p2 S22) s = 0 The optimal scores for the categories of the external criterion and the attributes are given by the eigenvector corresponding to the maximum eigenvalue. Concerning the relationship between the results, the following are derived.
Since the inverse of the matrix (37) hold about the scores for categories of the external criterion and that they are equivalent to each other except for the normalization of location and scale. However, the formulation based on canonical correlation analysis is more appropriate in view of the quantification of the external criterion, and more convenient to treat ordered categories ofthe external criterion or to derive the asymptotic properties of the sample optimal scores. Using the optimal scores {Skl} and {tj}, the qualitative attributes and the qualitative external criterion are quantified by Eqs. (25) and (24). It may be said, as in the case of the first method of quantification, that the efficiency of quantification is high when the correlation ratio r2 (or correlation coefficient p2) is large, but it is low when r2 is small. The contribution of the i-th attribute to the external criterion is measured by the partial correlation coefficient r[W(c) * W(i); W(1), . ,W(i -1), W(i + 1), . I I., W(I) (38) orapproximately by the range ofthe assigned scores Ri= max sijmin Stj i i When the discrimination among the categories of the external criterion is not satisfactory by assigning unidimensional scores, we may use multidimensional scores. In such cases the eigenvectors corresponding to the eigenvalues smaller than the largest 117 (39) should be used. The principle becomes the maximization of nliqm2 instead of q2 under the orthogonality constraints, Si' T sj = 0 for i j (40) Hayashi (6) discussed precisely the multidimensional case.
Fisher (1) proposed a method to quantify the response categories by the principle to maximize the variation due to the effects of factors relative to the total variation in a two-way analysis of variance. It gives the same result with that of Hayashi's second method when the response is chosen as the external criterion. A similar principle was also applied by Johnson (11).
For the investigation of a factor-response relationship, Hayashi's second method may be applied in the following two different manners. One is the case where a response item is chosen as the external criterion and the problem is to predict the response from the qualitative factors by a similar way as the regression analysis. It just corresponds to Fisher's method. The other is the case where a factor is chosen as the external criterion and the problem is to discriminate between groups corresponding to the categories of the chosen factor by a similar way as the canonical analysis. In relation to these two situations several generalized principles were proposed to quantify a single or multiple responses on the basis of a univariate or multivariate linear model by Tanaka and Asano (12,13) and Tanaka (14).

Third Method of Quantification (Quantification III)
Suppose that response patterns to categories of qualitative attributes are given in the form ofTable 4. In this table the subjects showing a same response pattern are pooled into one row, and the frequency of each response pattern is denoted by si, i = 1, 2, . . .. Q. The basic idea of the third method of quantifica- Ii and l denoting the number of categories to which subject i responds and the average over i = 1, 2, . . ..

Q.
Using the similar procedure to the formulation based on canonical correlation analysis in the case of the second method, we can easily derive the following eigenvalue problem, when we put If the information is poor by assigning unidimensional scores, we may use multidimensional scores. In such cases the eigenvector corresponding to the eigenvalues smaller than the largest should be used. The principle becomes to maximize Hipi by assigning multidimensional scores [xl"I, X(2, .. ., xi('] and y1 y,(2), * * ., y,(t)] under orthogonality conditions, As methods similar to or extended from Hayashi's third method there exist the scalogram analysis of Guttman (2) and the categorical canonical correlation analysis of Okamoto and Endo (15,16) and so on.

Fourth Method of Quantification (Quantification IV)
Suppose a similarity index eu is observed between each pair of subjects in a sample of size n, where the similarity index indicates that a pair (i, j) with a large eu is more similar with each other than a pair (i',j') with a small ei'j'. The fourth method ofquantification is a method to quantify the subjects on the basis of these similarity indexes, to represent them in an appropriate dimensional Euclidian space and, if it is required, to classify them.
When we assign a numerical score xi to subject i, the principle for quantification is expressed as follows. ,l 9 A . ..,,en).
According to the above explanation it may be clear that the fourth method of quantification is a kind of multidimensional scaling (MDS), or precisely speaking, a kind of metric MDS in the sense that the result depends on the value of ei; itself instead of the mnk order of eu.

Quantification of Ordered Categories
In the methods of quantification described above, no order relation is supposed among the categories of the qualitative external criterion and/or the qualitative attributes. Even if we have prior information about the order relations in actual data analysis, we sometimes obtain a solution inconsistent with the prior information by applying the ordinary methods ofquantification. In such cases it may be appropriate to apply the methods of quantification for ordered categories, which we shall discuss in this section.
Case with Some Order Relations among the Categories of Attributes in the First Method of Quantification Let us introduce inequality constraints corresponding to order relations among the categories of attribute items. Now that the first method of quantification is mathematically equivalent to the multiple regression analysis on dummy variables, we may formulate the problem of quantification for ordered categories as the problem of regression with some inequality constraints. Then we must solve the optimization problem (4) under some inequality constraints such as, for example, SjA . Sj2 2 ... Although the categories of the external criterion are defined as nominal in the ordinary method, we sometimes meet the situations with ordinal external criteria. For example, in medical research we meet situations in which the severity rating, improvement rating, or sometimes the movement of severity rating should be chosen as the external criterion and we wish to analyze the effects of factors on it.
According to the formulation based on canonical correlation analysis, the optimal score vector is obtained as the solution of Eq. (66) p2 = i S12S22-1S21 t'S11 -+max.  (18) extended it to the case of a special type of partial order restrictions, and we solved the case of arbitrary order restrictions generally (19)(20)(21).
As shown previously (19,21), the optimization problem [Eqs. (66), (67)] can be always transformed to the optimization problem under constraints of nonnegativeness and linear equalities such that p2 = Z' C z/ z' D z -) max. Since the constraints are generally linear, we can reformulate the problem as in the case of the second method of quantification and solve it iteratively but efficiently by using Wolfe's reduced gradient procedure. Furthermore, if we make use of the property that the mean square error is quadratic with respect to {Sjk}, we can solve the problem more efficiently by the quadratic programming technique.

(70)
where z' = [zW(m, z ), * * . , Z(m)]. After this transformation the numerical solution can be obtained efficiently by applying Wolfe's reduced gradient method. As a numerical example, Table 5, which shows the data for a five-treatment experiment with a five-point scoring scale, is taken from the study of Bradley,Katti,and Coons (17). Let us suppose the orderrestrictionsti -{t2, t3} 2t4 .-t5artificially and apply the generalized method, where a i {b, c} denotes a Z b and a Z c. These restrictions are expressed by Figure 1.
Then the problem becomes whereJ denotes a set of subscripts for the items with ordered categories. Then the problem beconmes to maximize the nonlinear objective function (16) under the inequality restrictions (76) and can be solved generally by the procedure described above.
In the discriminant analysis using the quantified qualitative variables, we sometimes meet situations where each of the order restrictions may be ascending or descending, i.e., SjorSJ2 SJC(J) or (77) This type ofquantification was discussed by Tanaka, Asano, and Kubota (22).  Table 6. Normalizing so as to satisfy = 0.0, the optimal scores are given as Takane (24) generalized Kruskal's method and proposed the alternating least squares algorithm. Comparing with these two methods, our method has the following advantages and disadvantages (25). Advantages. It is applicable to generalized criteria foroptimal scaling such as CS-1-5, CM-1-7 proposed previously (12)(13)(14). It is also applicable to the cases with arbitrary partial order relations. The rapidness of convergence depends only on the number of ordered categories, say p. Thus it can be efficiently used when p is small.
Disadvantages. It does not converge rapidly when p is large.

Statistical Considerations
Few statistical considerations of quantification had been studied until comparatively lately. Okamoto and Endo (16) investigated the asymptotic distribution of the sample optimal scores for their categorical canonical correlation analysis, which was proposed as a generalization for third method of quantification. Tanaka and Asano (12,13) and Tanaka (14) studied the statistical inference of factor-response relationships as well as the asymptotic distribution of the dptimal scores based on their CS-1-5 and CM-1-7 criteria, which were proposed as generalizations for the second method of quantification. Although the probabilistic models introduced should be evaluated if they fit to the actual data, the methods will be useful when the sample sizes are large enough to be analyzed by asymptotic theories.
Consider the case of the second method of quantification, where there exist a response and several factors, and the response is chosen as the external criterion. For such cases the probabilistic model shown in Figure 2 has been proposed (12)(13)(14). As shown in the preceding section, the optimal scores are determined by an eigenvalue problem such that (A -X B) t = 0 By means of the 8-method, small deviations of the eigenvalues and vectors can be asymptotically approximated by linear equations of the small deviations of the matrices A and B, under the assumption that the eigenvalues are all distinct. Furthermore, small deviations of the elements of the matrices A and B can be expressed by the Taylor expansions of the multinomial proportions on the basis ofthe above probabilistic model. Thus, as a result, the small deviations of the eigenvalues and vectors (optimal scores) are asymptotically approximated by the functions of the small deviations of the multinomial proportions. From this, the asymptotic normality of the sample optimal scores are derived.

Computer Programs
The use of electronic computers is indispensable in applying the methods of quantification, because the calculations are complex and ordinarily a comparatively large amount of data are analyzed by these methods. One of the reasons that Hayashi's four methods are widely applied in Japan may be that the program packages are available to the data analysis. They are, for example, Component Analysis 1-4 (IBM-Japan), Quantas 1-4 (FACOM), Firms III (NEAC), Quantification 1-4 (Dentsu MARK III), Hayasi 1 -4 (SPSS-Japanese Version), and so on. Furthermore, in the NISAN system (26), now being developed by a group of Japanese statisticians, the varieties of methods including those for ordered categories and based on the asymptotic theories will be available for the convenience of senior statisticians. It may be obvious from the derivations in the previous sections that the methods of quantification, especially from the first to third methods, are mathematically equivalent to regression analysis, canonical analysis, and canonical correlation analysis applied to dummy variables corresponding to categorical data. Therefore, if we carefully use the programs, we can apply the methods of quantification to data analysis by means of the programs for ordinary multivariate analyses.