NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Committee to Examine the Methodology for the Assessment of Research-Doctorate Programs; Ostriker JP, Kuh CV, editors. Assessing Research-Doctorate Programs: A Methodology Study. Washington (DC): National Academies Press (US); 2003.

**Alternate Ways to Present Rankings: Random Halves and Bootstrap Methods****Correlates of Reputation Analysis**

## Alternate Ways to Present Rankings: Random Halves and Bootstrap Methods

Reputational surveys, such as those conducted for earlier research-doctorate program assessments, were not designed to provide accurate rankings of the programs. They represented estimates of ratings, where the results could vary, depending on the selected set of raters. The confidence interval analysis performed in the last two assessments illustrated this point. However, users of the assessments chose to ignore this and focused instead on specific scores obtained by averaging questionnaire responses.

A far better method would be to incorporate variability into the reporting of ratings and display a range of program ratings rather than a single ranking. Random Halves and Bootstrap are two methods which could be used to assign measures of accuracy to statistical estimates as well as to present data. Both methods involve the resampling of original data set in slightly different ways and would provide slightly different results.

### Methods

For a particular field, such as English Language and Literature, assume there are M
programs and N program raters. Each rater only rates a subset of M programs;
therefore, some programs may be rated more often than others, since the number of
rating for a program depends on which raters responded to the survey and whether
they actually rated a program on their questionnaire. A response matrix
**R** can be constructed with a reputational rating r*
_{ij}
* as an entry for rater

*i*rating program

*j*,

*i*=1,…,N and

*j*=1,…,M. Along each of the rows in the matrix there will be blank spaces for programs that the rater was not asked to rate or did not rate. The different ratings for a given program are then aggregated into a single “mean” rating, ${\mathrm{r\u0304}}_{j}$ (${\mathrm{r\u0304}}_{j}$ could also include weighting and trimming, for example, and may not be just the simple mean of all ratings for program

*j*).

*Random Halves Method:* The Random Halves method is closely related to
what is known in statistics literature as the “random group method” for assessing
variances of estimates in complex sample surveys. This approach, which has many
variants, has literature that goes back to at least 1939 (see Wolter, 1985). It is closely related to another
method called the “Jackknife” which was introduced in 1949 and popularized in the
1960s. The essence of the random group or the Jackknife method is to calculate a
numerical quantity of interest on a smaller part of the whole data set, and to do
this for several such smaller parts of the original data. The differences between
the results from these smaller parts of the data are combined to assess the amount
of variability, computed on the whole data set. The random halves method is an
example of this in which the smaller parts of the whole data are random halves of
the data.

The Random Halves method is applied as follows: A random sample of N/2 of the rows of
R is made without replacement, meaning that a row cannot be selected twice. The mean
${\mathrm{r\u0304}}_{j}$
for each program is
then computed from this random half sample of the full data. All the programs are
then ranked on a basis of these mean ratings. This procedure could be repeated ten,
one hundred, or several hundred times to produce a range of ratings and rankings for
each program in the field. Rankings for each program could be summarized as the
distribution that lies within the interquartile range of the ratings. Users of
reputational ratings would recognize that raters rate programs differently and half
of the raters ranked program *j* from *a* to
*b,* where *a* is the 25 percentile of its ranking
distribution and *b* is the 75 percentile.

*Bootstrap Method:* The Bootstrap method was developed more recently
than the random group method, and its literature only dates back to 1979. It is well
described in Efron (1982) and
Efron and Tibshirani
(1993). Although the Bootstrap method was not created specifically for
assessing variances in complex sample surveys, it has been used for that purpose. It
was created as a general method for assessing the variability of the results of any
type of data analysis, complex or simple, and has become a standard tool. Instead of
sampling N/2 rows of **R**
*without* replacement, N of the rows would be sampled from
**R**
*with* replacement, meaning that a row could be selected several
times. The same procedure could be used for computing the mean, as in the Random
Halves method.

The two methods provide very similar results. The perceived advantage in the Random Halves method is in the process, where a rater pool is selected and half the raters are sent questionnaires. This rating process is repeated again and again for the original pool. It is not significantly different from what was done in the past, when the selection of raters and the use of a confidence interval show that a certain percentage of the ratings would fall in a similar interval even if a different set of raters were selected. The advantage in the Bootstrap method, on the other hand, is an established method with a developed theory for the statistical accuracy of survey measurements.

### A Comparison of the Random Halves and Bootstrap Methods

The differences between the methods can be demonstrated by the following simple example. Consider an example where three raters rate two programs. The raters are labeled 1, 2 and 3, and the two programs are labeled A and B. The Rating Matrix is:

In this example, all three raters rate the same two programs on a scale of 0 to 2. In turning ratings into rankings, assume that lower ratings correspond to assessments of higher quality. Thus, rater 1 rated A higher than B, by giving A a rating of 0 and B a rating of 1. The last row of the Rating Matrix has the average ratings for each program. For these ratings, B is ranked higher than A because its average rating is slightly lower than that of A. In the discussion of the example, the rank of A, will be denoted by Rank(A). Therefore, Rank(A)=2, while Rank(B)=1.

This example may appear to be unrealistic in at least two ways. First, it is very
small. This means that it is only possible to examine the probability that A is
ranked 1^{st} or 2^{nd}. Second, programs are not sampled for raters
to rate, instead, the raters rate *all* of the programs in the
example. However, neither of these simplifications is very important for the things
that will be demonstrated by the example. On the other hand, the example shows some
differences among ratings of the three raters. Rater 1 ranks A and B differently
from the way Raters 2 and 3 do. Also the second raters rating numbers are higher
than the other two.

In applying Random Halves (RH) to this example, there are two variations, since the number of responses is not an even number. Hence, denote by RH(1) the “half-sample” consists of 1 of the 3 raters chosen at random, and in RH(2) the “half-sample” consists of 2 of the 3 raters chosen at random. These are the only possibilities for the RH method in the example.

In the RH(1) case, since there are three possible raters to be sampled, they are each sampled with probability 1/3, and that the averages are the rating. Below is a table that summarizes the three possible sample results for RH(1).

Because Rank(A)=2 in two of the three possible half samples, the probability that Rank(A)=2 is 2/3. This should be compared to the finding that in the data (i.e., the Rating Matrix on Table 1) the Rank of A is 2, so the RH(1) method indicates that it could have been different from 2 about 1/3 of the time.

In the RH(2) case, two raters are sampled, and there are three possibilities {1,2}, {1,3} and {2,3}. Suppose the two sampled raters are 1 and 2. Then the data to be averaged are given in the following table. The table below summarizes what occurs for three possible half samples for RH(2). Note that in the cases, where the average ratings are the same, random tie splitting is used and the rank order is denoted by 1.5.

In the case of RH(2), there are three ways to get the probability that Rank(A)=2. The first is from sample {2,3}. The other two ways are either one of two other samples and have the tie split so that Rank (A)=2. Hence, the probability is 1/3+(1/3)(1/2)+ (1/3)(1/2)=2/3. The fraction, 1/2, represents the tie splitting. Note that 2/3 is also the probability for Rank(A)=2 in RH(1).

In summary, the RH method calls for repeatedly taking “half-samples” of the rating
matrix, averaging the resulting ratings for A and B, and then ranking A and B based
on these average ratings. In resampling over and over, a distribution is constructed
of how many times A is ranked 1 or 2. For example, in the case of RH(1) or RH(2), A
would be ranked 2 about 2/3^{rds} of the time. Therefore, while the two
versions of the RH method give different data, using random tie splitting gives the
same results for the probability that A is ranked 2.

Applying the Bootstrap (Boot) method to the example, three raters were sampled, and the same rater could be selected more than once. They were regarded as representative of all the possible raters who could have been sampled to rate the programs. Clearly such an assumption varies in plausibility due to various factors, such as how many raters are being considered and how they are originally chosen. It is, however, a useful assumption and appears throughout many applications of statistics.

In sampling three rows from the original Rating Matrix there are 27 possible combinations or the probability of any sample is 1/27. They are listed in the following table.

Rank(A)=2 occurs a total of 20 times in the table above, yielding a probability of
20/27 =.74. This is different from the results of the RH methods (i.e., .67).
However, it is still plausible because while A was ranked second in a sample of 3,
there is still some probability that it could have been ranked 1 in a different
sample of raters. The Boot method produced a somewhat smaller probability estimate,
i.e., .26 rather than .33, so that A could have been ranked 1^{st}, but both
of these values are less than 1/2 and, are both plausible in such a small
example.

There is no very convincing, intuitive way to favor either one of these two probability estimates, .67 or .74. Hence, this example has little to offer in making an intuitive choice between the two approaches. What this does show is that the RH and Boot methods do not give the same results for something that is closely related to the types of probabilities. Thus, any claim that the two methods are “equivalent” is wrong, but they are clearly “similar.”

Statisticians who are specialists in variance estimation prefer the Bootstrap to ad hoc methods because it is grounded in theory. The Bootstrap method is the nonparametric, maximum likelihood estimate of the probability that Rank(A)=2. The Random Halves method does not enjoy this property. However, variance estimation is an important subject in statistics and many methods, in particular the Jackknife, can be tailored to situations where they provide serious competition to the Bootstrap. The next section will illustrate that, when the number of raters and programs are both large, there is little difference between the Random Halves and the Bootstrap methods.

### Analysis of the Expected Variance for the Two Methods

A natural question to ask is: What do the Random Halves and Boot methods produce for probability distributions of average ratings for programs? Drawing on some results from probability theory it can be shown that these methods give similar results.

Any method of resampling creates random variables with distributions that depend on
the resampling method. In the rating example, let the random variables for the
average ratings that result for A and B for each sample be denoted by R_{A}
and R_{B}, respectively. These are random variables with means and variances
that have well-known values. The average ratings of A and B in the rating matrix are
given in the last row of The Rating Matrix in Table 1, and they are denoted in general as
r_{A} and r_{B}. Thus, in the example, r_{A} =1 and
r_{B}=2/3. In addition to the average ratings, the variance of the
ratings in each column is defined as the average of the squares of the ratings in
each column minus the square of the mean rating for that column. Thus, for program
A, the variance is

V_{A}=(0^{2} +2^{2}
+1^{2})/3−1^{2}=5/3−1=2/3,

and, for program B, it is

V_{B}=(1^{2} +1^{2}
+0^{2})/3−(2/3)^{2}=2/3−4/9=6/9−4/9=2/9.

Table 5 gives the results for N
raters rating Program A and n raters used in the RH(n) method. If N is even, then
n=N/2. In the table let E(R_{A}) denote the “expected value” or “long-run
average value” of the average rating for A, R_{A}. Statistics show that it
is the same value, r_{A}, for both the Boot and the RH methods.
r_{A} is also the average rating for A in the original Rating Matrix,
and in general, r_{A} is the average rating given to program A by the raters
rating it. Thus, both the RH and Boot methods are unbiased for r_{A}, and
any sensible resampling method will share this property.

Where the two methods can differ is in the value of the variance, Var(R_{A}).
This variance is a measure of how much R_{A} deviates on average from the
mean value, r_{A}, from one random resampling to another. Observe that both
formulas for Var(R_{A}) involve, V_{A}, the variance of the ratings
in the column of the Rating Matrix for program A. Note that when N is even, and
n=N/2 then the N−n in the numerator for RH(n) is n and it cancels the n in the
denominator leaving only N−1 in the denominator. This is to be compared to the N in
the denominator for the Bootstrap method. When N, the number of raters is large,
then N and N−1 are close and the variances of average rating, R_{A}, for the
two methods are nearly the same.

The factor or the right side of the formula for the RH(n) variance is known as the finite sampling correction and it gets smaller as n increases relative to N. In the simple example, here is what these formulas yield.

**RH(1):** In this case, R_{A} takes on these three possible values
with the corresponding probabilities.

Possible average ratings | 0 | 1 | 2 |

Probabilities | 1/3 | 1/3 | 1/3 |

The mean of this distribution is 0(1/3)+1(1/3)+2(1/3)=1=r_{A}.

Its variance is
0^{2}(1/3)+1^{2}(1/3)+2^{2}(1/3)−1^{2} =2/3.

Applying the formula for the variance for RH(1) from Table 5 gives

((2/3)/1)(3−1)/(3−1)=2/3, the same value.

**RH(2):** In this case, RA takes on these three possible values with the
corresponding probabilities.

Possible average ratings | 1/2 | 1 | 3/2 |

Probabilities | 1/3 | 1/3 | 1/3 |

The mean of this distribution is (1/2)(1/3)+1(1/3)+(3/2)(1/3)=1= r_{A}, as
before.

Its variance is
(1/2)^{2}(1/3)+(1)^{2}(1/3)+(3/2)^{2}(1/3)−1^{2}
= ((1/4)+1+(9/4))/3−1=(14/4)73−(12/12)=2/12=1/6.

Applying the formula for the variance for RH(2) from Table 7 gives

((2/3)/2)(3–2)/(3–1)=(1/3)(1/2)=1/6, the same value.

**Boot:** In this case, R_{A} takes on seven possible values with
the corresponding probabilities.

Possible average ratings | 0 | 1/3 | 2/3 | 1 | 4/3 | 5/3 | 2 |

Probabilities | 1/27 | 3/27 | 6/27 | 7/27 | 6/27 | 3/37 | 1/3 |

These probabilities are found by summing up the Bootstrap samples that yield the given possible value in Table 4. This is a larger set of possible average ratings for A than either one of the RH methods gives. This is due to the richer set of samples available under the Boot method.

The mean of this distribution is (0)(1/27)+(1/3)(3/27)+(2/3)(6/27)+(1)(7/27)+
(4/3)(6/27)+(5/3)(3/27)+(2)(1/27)=1=r_{A}, as it is for the other two
methods.

The variance is
(0)^{2}(1/27)+(1/3)^{2}(3/27)+(2/3)^{2}(6/27)+(1)^{2}(7/27)+(4/3)^{2}(6/27)+
(5/3)^{2}(3/27)+(2)^{2}(1/27)−1^{2}=(1/9)(1/27)(3+24+63+96+75+36)−1=
(297/(9×27))−1=(11/9)−(9/9)=2/9.

Applying the formula for the variance for Boot from Table 5 gives ((2/3)/3)=2/9, the same value.

### Summary of results

The mean and variance calculations as applied to this simple example illustrates the following:

- The RH and Boot methods are only similar when N, the number of raters rating a program, is large enough to make the difference between N and N−1 negligible.
- The set of possible samples from which resampling takes place differs for the two methods, the one for method Boot is much larger in general.
- Both methods are unbiased for the mean rating of a program, but they differ in their variances. When N is even, the variance of Boot is smaller, when N is odd, the variance of Boot lies between that for RH(n) and RH(n+1) where n<N/2<n+1. This is observed by examining the data in Table 4.
- The Boot method usually has a much richer set of possible ratings in its resampling distribution, and fewer ties.

### References

- Wolter KM. Introduction to Variance Estimation. New York: Springer-Verlag; 1985.
- Efron B. The Jackknife, the Bootstrap and other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics; 1982.
- Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman & Hall; 1993.

## Correlates of Reputation Analysis

The reputational quality of a program is a purely subjective measure; however, it is related to quantitative measures in the sense that quality judgment could be made on the basis of information about programs, such as the scholarly work of the faculty and the honors awarded to the faculty for that scholarship. Therefore, it may be possible to relate or to predict quality rankings for programs using quantitative measures. It is clear that predicted quality rankings would also be subjective and that the accuracy of such predictions may change over time.

One way to construct such a relationship is to do a least squares multilinear regression.
The dependent variable in the regression analysis is represented by a set of average
ratings, r_{1}, r_{2}, . . , r_{N} for N programs in a
particular field. The predictors or independent variables would be a set of quantitative
or coded program characteristics that are represented by a vector,
**x**_{n}, for program n. The analysis would construct a function
f(**x**) which provides a *predicted* average rating
f(**x**_{n}) for program n. In this case the relation between
r_{n} and f(**x**_{n}) would be

r_{n}=f(**x**_{n})+e_{n}=a_{1}x_{1},_{n}+a_{2}x_{2},_{n}+…+a_{m}x_{m},_{n}+a_{m+1}+e_{n}

(1)

where x_{1},_{n}, x_{2},_{n},…,
x_{m},_{n} represent the m quantity or coded characteristics for the
program n in the field, and e_{n}, is the residual or the amount by which the
predicted average rating varies from the actual average rating for that program. If the
prediction is “good” then the residuals are relatively small. The coefficients aj are
determined by minimizing the sum of the squares of the differences
r_{n}−f(**x**_{n}).

While a single regression equation is generated using quantitative data and the
reputational score, the selected raters of the program provide a certain amount of
variability. This variability can be shown in the following manner: Associated with each
coefficient ai is a 95%-confidence interval [L_{i}, U_{i}], and by
randomly selecting values for the coefficients within their confidence intervals, a
predicted average rating ŕ_{n} can be generated for program n. A measure of how
close the set of ŕ_{n} ratings is to the r_{n} ratings can be calculated
by

||**ŕ**-**r**||^{2}<ps^{2}F,

(2)

where ŕ=(ŕ_{1}, ŕ_{2},…, ŕ_{N}), **r**=(r_{1},
r_{2},…, r_{N}) and || ||^{2} denotes the sum of squares of
the components of the difference vector. The bound on the inequality, p s^{2} F,
is a constant that is derived from the regression analysis.

p=m, the number of nonconstant terms in the regression equation, s

^{2}is the “mean square for error” given in the output of a regression program, and F=the 95% cutoff point for the F-distribution with p and n-p degrees of freedom.

By repeating the random selection of coefficients many times, a collection of
coefficients can be determined that satisfies inequality (2), and the upper- and
lower-bounds of this collection defines an interval [L'_{i}, U'_{i}].
For coefficients in these intervals a range of predicted ratings can be generated.

From the practical point of a program trying to estimate the quality of its program, a
few years after a reputational survey is conducted, it could use a linear regression
equation with coefficients in [L'_{i}, U'_{i}] to generate a new range
of ratings based on current program data, or if data for all programs in the field were
available, a new interquartile ranking of programs could be obtained.

The following is an example where this method is applied to the 1995 ratings of programs in Mathematics.

### Mathematics

Using the STATA statistical package and applying a forward stepwise, least-squares linear regression on a large number of quantitative variables which characterized publications, citations, faculty size and rank, research grant support, number of doctorates by gender and race/ethnicity, graduate students by gender, graduate student support, and time to degree, the following seven variables were identified as being the most significant:

(ginipub) | Gini Coefficient for Program Publications, 1988–92: The Gini coefficient is an indicator of the concentration of publications on a small number of the program faculty during the period 1988–92. |

(phds) | Total Number of Doctorates FY 86–92 |

(perfull) | Percentage of Full Professors Participating in the Program |

(persupp) | Percentage of Program Faculty with Research Support (1986–92) |

(perfpub) | Percentage of Program Faculty Publishing in the Period 1988–1992 |

(ratiocit) | Ratio of the Total Number of Program Citations in the Period 1988–1992 to the Number of Program Faculty |

(myd) | Median Time Lapse from Entering Graduate School to Receipt of Ph.D. in Years |

Results of a regression analysis are shown below. About 95% of the variation is
explained by these variables, where R^{2}=0.8304.

Source | SS | df | MS | Number of obs=139 F(7, 131)=91.60 Prob>F=0.0000 R-squared=0.8304 Adj R-squared=0.8213 Root MSE=.4186 |

Model | 112.36003 | 7 | 16.0514329 | |

Residual | 22.954789 | 131 | .175227397 | |

Total | 135.314819 | 138 | .98054217 |

quality | Coef. | Std. Err. | t | P>|t| | [95% Conf. Interval] | |
---|---|---|---|---|---|---|

phds | .3489197 | .0544665 | 6.41 | 0.000 | .2411721 | .4566674 |

perfull | .008572 | .0027864 | 3.08 | 0.003 | .0030598 | .0140842 |

persupp | .0183162 | .0025146 | 7.28 | 0.000 | .0133418 | .0232906 |

perfpub | −.0150464 | .0035235 | −4.27 | 0.000 | −.0220167 | −.0080762 |

ratiocit | .0258671 | .0077198 | 3.35 | 0.001 | .0105955 | .0411387 |

myd | −.7737551 | .1995707 | −3.88 | 0.000 | −1.168553 | −.3789567 |

ginipub | −.0294944 | .0044222 | −6.67 | 0.000 | −.0382425 | −.0207462 |

_cons | 3.070145 | .3625634 | 8.47 | 0.000 | 2.352908 | 3.787382 |

The resulting predictor equation is:

**f(x)**=3.07+0.349(phds)+0.009(perfull)+0.018(persupp)−0.015(perfpub)+
0.026(ratiocit)−.774(myd)−0.029(ginipub)

It is noted that the Root Mean Square Error (RMSE) from the regression is 0.4186, and the variation in scores from the 1995 confidence interval calculation has an RMSE of 0.2277.

The following is scatter plot of the actual 1995 ratings and the predicted ratings.

The 95%-confidence interval for each of the variables used in the regression can now
be used to find a new estimate for the quality score. As described above, values for
the coefficients in the regression equation are randomly selected in the intervals
and tested to see if that set of coefficients satisfies the relation
||**ŕ**−**r**||^{2}<p s^{2} F. For
Mathematics data the bound p s^{2} F=(7)(.4186)^{2}(2.12)=2.563556.
For this example 3,000 random selections were made in the coefficient intervals and
220 coefficients sets satisfied the inequality. The corresponding maximum and
minimum interval are:

phds coefficient | persupp coefficient | ginipub coefficient | myd coefficient | perfpub coefficient | ratiocit coefficient | perfull coefficient | constant | |
---|---|---|---|---|---|---|---|---|

Max | 0.35469 | 0.018583 | −0.029026 | −0.7526 | −0.014673 | 0.026686 | 0.0088674 | 3.10858 |

Min | 0.34314 | 0.018049 | −0.029964 | −0.79495 | −0.015421 | 0.025047 | 0.0082761 | 3.03164 |

Using the values in the above table, the maximum and minimum predicted quality scores can be calculated, and the scores for Mathematics programs are displayed in the table below.

As described earlier, these maximum and minimum coefficient values could be used to construct new quality scores, by randomly selecting the coefficients in the regression equation between the corresponding maximum and minimum values. If this is done repeatedly a collection of quality scores is obtained for each program and the interquartile range of this collection could be generated. This was done 100 times and the results are given as the Predicted Ranks in the table with the Bootstrap rankings.

Quality Score | Predicted Ranks | Bootstrap Ranks | ||||
---|---|---|---|---|---|---|

Institution | Maximum | Minimum | 1st Quartile | 3rd Quartile | 1st Quartile | 3rd Quartile |

Dartmouth College | 2.73 | 2.51 | 73 | 76 | 53 | 62 |

Boston University | 2.70 | 2.42 | 77 | 80 | 48 | 52 |

Brandeis University | 3.17 | 2.88 | 49 | 51 | 32 | 36 |

Harvard University | 4.41 | 4.09 | 8 | 9 | 2 | 4 |

Massachusetts Inst of Technology | 5.27 | 4.93 | 2 | 2 | 3 | 4 |

U of Massachusetts at Amherst | 3.40 | 3.11 | 38 | 40 | 54 | 60 |

Northeastern University | 2.41 | 2.13 | 99 | 103 | 70 | 80 |

Brown University | 4.60 | 4.31 | 5 | 6 | 26 | 29 |

Brown University-Applied Math | 4.59 | 4.26 | 6 | 6 | 14 | 17 |

University of Rhode Island | 1.69 | 1.40 | 128 | 129 | 122 | 125 |

University of Connecticut | 2.66 | 2.39 | 79 | 83 | 98 | 102 |

Wesleyan University | 2.31 | 2.09 | 104 | 107 | 101 | 110 |

Yale University | 3.38 | 3.13 | 38 | 40 | 7 | 8 |

Adelphi University | 1.07 | 0.82 | 138 | 138 | 130 | 133 |

CUNY—Grad Sch & Univ Center | 3.38 | 3.10 | 40 | 41 | 30 | 32 |

Clarkson University | 2.49 | 2.21 | 90 | 94 | 109 | 118 |

Columbia University | 4.32 | 3.99 | 11 | 11 | 10 | 12 |

Cornell University | 4.81 | 4.46 | 3 | 4 | 14 | 16 |

New York University | 4.83 | 4.50 | 3 | 4 | 7 | 8 |

Polytechnic University | 2.15 | 1.88 | 112 | 114 | 98 | 105 |

Rensselaer Polytechnic Inst | 3.64 | 3.36 | 27 | 30 | 48 | 52 |

University of Rochester | 3.10 | 2.83 | 52 | 54 | 56 | 62 |

State Univ of New York-Albany | 2.55 | 2.33 | 85 | 88 | 82 | 90 |

State Univ of New York-Binghamton | 2.55 | 2.33 | 85 | 87 | 65 | 75 |

State Univ of New York-Buffalo | 3.00 | 2.76 | 57 | 59 | 61 | 70 |

State Univ of New York-Stony Brook | 3.60 | 3.31 | 30 | 32 | 19 | 22 |

Syracuse University | 2.42 | 2.18 | 95 | 100 | 76 | 84 |

Princeton University | 4.52 | 4.21 | 7 | 7 | 2 | 3 |

Rutgers State Univ-New Brunswick | 4.06 | 3.77 | 16 | 18 | 17 | 20 |

Stevens Inst of Technology | 1.73 | 1.48 | 127 | 127 | 121 | 128 |

Carnegie Mellon University | 3.63 | 3.33 | 28 | 31 | 34 | 40 |

### English Language and Literature

Applying the same method to the 1995 programs in English Language and Literature, a slightly different result is obtained, since programs in this field do not have the same productivity characteristics as those in Mathematics. Again, forward stepwise least squares linear regression was applied to a large number of quantitative variables, and the following were identified as being the most significant:

(nopubs2) | Number of Publications During the Period 1985–1992 |

(perfawd) | Percentage of Program Faculty with at Least One Honor or Award for the Period 1986–1992 |

(acadplan) | Total Number of Doctorates FY 1986–1992 with academic employment plans at the 4-year college or university level. |

(ginicit) | Gini Coefficienticient for Program Citations, 1988–1992: The Gini coefficienticient is an indicator of the concentration of citations on a small number of the program faculty during the period 1988–1992. |

(nocits1) | Number of Citations During the Period 1981–1992 |

(fullprof) | Percentage of Full Professors Participating in the Program |

(empplan) | Total Number of Doctorates FY 1986–1992 with Employment Plans. |

None of the variables identified in the Mathematics regression are present in this regression analysis.

Results of this regression analysis are shown below. About 95% of the variation is
explained by these variables, where R^{2}=0.8106.

Source | SS | df | MS | Number of obs=117 F(7, 109)=66.65 Prob>F=0.0000 R-squared=0.8106 Adj R-squared=0.7984 Root MSE=.42429 |

Model | 83.985691 | 7 | 11.9979559 | |

Residual | 19.6227839 | 109 | .18002554 | |

Total | 103.608475 | 116 | .893176507 |

q93a | Coef. | Std. Err. | t | P>|t| | [95% Conf. Interval] | |
---|---|---|---|---|---|---|

nopubs2 | .1202936 | .1017753 | 1.18 | 0.240 | −.0814218 | .322009 |

perfawd | .0326877 | .0041423 | 7.89 | 0.000 | .0244777 | .0408977 |

acadplan | −.7961931 | .2416467 | 3.29 | 0.001 | .3172573 | 1.275129 |

ginicit | −.0007486 | .0001839 | −4.07 | 0.000 | −.001113 | −.0003842 |

nocits1 | −.0827859 | .0234272 | 3.53 | 0.001 | .036354 | .1292178 |

fullprof | −.2942413 | .1096454 | 2.68 | 0.008 | .0769276 | .511555 |

empplan | −.599897 | .2698761 | −2.22 | 0.028 | −1.134783 | −.0650113 |

_cons | 1.955276 | .1533968 | 12.75 | 0.000 | 1.651249 | 2.259304 |

The resulting predictor equation is:

**f(x)**=1.955+0.12(nopubs2)+0.033(perfawd)+0.796(acadplan)
−0.001(ginicit)+0.083(nocits1)+0.294(fullprof)−0.6(emppplan).

The following is a scatter plot of the Random Halves draw from the 1995 rankings and the predicted ranking for that draw.

For programs in English Language and Literature, the Root Mean Square Error (RMSE) from the regression is 0.42429, and the variation in scores from the 1995 confidence interval calculation has an RMSE of 0.2544.

In Mathematics the 95%-confidence interval for each of the variables used in the
regression can be used to determine a new estimate for the quality score. In this
case, the bound p s^{2} F=(7)(.42869)^{2}(2.18)=2.747136. For this
example 3,000 random selections were also made in the coefficient intervals and 242
coefficients sets satisfied the inequality. The corresponding maximum and minimum
intervals are:

nopubs2 coefficient | perfawd coefficient | acadplan coefficient | ginicit coefficient | nocits coefficient | fullprof coefficient | empplan coefficient | constant | |
---|---|---|---|---|---|---|---|---|

Max | 0.13384 | 0.033239 | 0.82835 | −0.00072 | 0.085903 | 0.30883 | −0.56399 | 1.97569 |

Min | 0.10684 | 0.03214 | 0.76425 | −0.00077 | 0.079689 | 0.27975 | −0.63557 | 1.935 |

For the example used with Mathematics programs, the maximum and minimum values for the coefficients can be used to calculate the maximum and minimum predicted quality scores for the programs in English Language and Literature. These scores are displayed in the table below.

Repeating the exercise, described for Mathematics, of randomly selecting coefficient values in the maximum-minimum intervals a large number of times, an interquartile range can be generated for programs in English Language and Literature. This was again done 100 times and the results are given as the Predicted Ranks in the table with the Random Halves rankings.

Quality Score | Predicted Ranks | Random Halves Ranks | ||||
---|---|---|---|---|---|---|

Institution | Maximum | Maximum | 1st Quartile | 3rd Quartile | 1st Quartile | 3rd Quartile |

University of New Hampshire | 2.74 | 2.56 | 91 | 93 | 70 | 77 |

Boston College | 2.57 | 2.42 | 96 | 98 | 59 | 64 |

Boston University | 3.80 | 3.59 | 20 | 21 | 38 | 42 |

Brandeis University | 3.63 | 3.40 | 19 | 21 | 44 | 55 |

Harvard University | 5.55 | 5.05 | 1 | 1 | 2 | 3 |

U of Massachusetts at Amherst | 3.84 | 3.51 | 30 | 34 | 38 | 43 |

Tufts University | 2.35 | 2.22 | 108 | 110 | 67 | 74 |

Brown University | 4.21 | 3.78 | 15 | 16 | 13 | 15 |

University of Rhode Island | 2.39 | 2.22 | 113 | 115 | 94 | 113 |

University of Connecticut | 3.26 | 3.05 | 53 | 57 | 79 | 87 |

Yale University | 5.07 | 4.52 | 5 | 6 | 2 | 3 |

CUNY—Grad Sch & Univ Center | 3.50 | 3.21 | 42 | 48 | 18 | 19 |

Columbia University | 4.90 | 4.24 | 9 | 10 | 7 | 9 |

Cornell University | 4.71 | 4.16 | 13 | 13 | 6 | 8 |

St John's University | 1.93 | 1.86 | 127 | 127 | 119 | 122 |

Fordham University | 2.38 | 2.23 | 103 | 106 | 104 | 112 |

New York University | 3.59 | 3.25 | 26 | 28 | 18 | 20 |

Drew University | 2.30 | 2.15 | 116 | 119 | 123 | 126 |

University of Rochester | 3.30 | 3.02 | 30 | 33 | 44 | 48 |

State Univ of New York-Binghamton | 3.01 | 2.72 | 62 | 64 | 65 | 69 |

State Univ of New York-Buffalo | 3.65 | 3.16 | 30 | 37 | 25 | 27 |

State U of New York-StonyBrook | 3.17 | 2.77 | 48 | 55 | 46 | 52 |

Syracuse University | 2.53 | 2.38 | 95 | 98 | 71 | 76 |

Indiana Univ of Pennsylvania | 2.19 | 1.93 | 124 | 126 | 122 | 124 |

Princeton University | 4.82 | 4.39 | 5 | 6 | 12 | 14 |

Rutgers State Univ-New Brunswick | 3.96 | 3.62 | 22 | 23 | 16 | 18 |

Carnegie Mellon University | 3.17 | 3.01 | 33 | 35 | 52 | 54 |

- Technical and Statistical Techniques - Assessing Research-Doctorate ProgramsTechnical and Statistical Techniques - Assessing Research-Doctorate Programs

Your browsing activity is empty.

Activity recording is turned off.

See more...