NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Committee to Examine the Methodology for the Assessment of Research-Doctorate Programs; Ostriker JP, Kuh CV, editors. Assessing Research-Doctorate Programs: A Methodology Study. Washington (DC): National Academies Press (US); 2003.

Cover of Assessing Research-Doctorate Programs

Assessing Research-Doctorate Programs: A Methodology Study.

Show details

GTechnical and Statistical Techniques

  1. Alternate Ways to Present Rankings: Random Halves and Bootstrap Methods
  2. Correlates of Reputation Analysis

Alternate Ways to Present Rankings: Random Halves and Bootstrap Methods

Reputational surveys, such as those conducted for earlier research-doctorate program assessments, were not designed to provide accurate rankings of the programs. They represented estimates of ratings, where the results could vary, depending on the selected set of raters. The confidence interval analysis performed in the last two assessments illustrated this point. However, users of the assessments chose to ignore this and focused instead on specific scores obtained by averaging questionnaire responses.

A far better method would be to incorporate variability into the reporting of ratings and display a range of program ratings rather than a single ranking. Random Halves and Bootstrap are two methods which could be used to assign measures of accuracy to statistical estimates as well as to present data. Both methods involve the resampling of original data set in slightly different ways and would provide slightly different results.

Methods

For a particular field, such as English Language and Literature, assume there are M programs and N program raters. Each rater only rates a subset of M programs; therefore, some programs may be rated more often than others, since the number of rating for a program depends on which raters responded to the survey and whether they actually rated a program on their questionnaire. A response matrix R can be constructed with a reputational rating r ij as an entry for rater i rating program j, i=1,…,N and j=1,…,M. Along each of the rows in the matrix there will be blank spaces for programs that the rater was not asked to rate or did not rate. The different ratings for a given program are then aggregated into a single “mean” rating, j (j could also include weighting and trimming, for example, and may not be just the simple mean of all ratings for program j).

Random Halves Method: The Random Halves method is closely related to what is known in statistics literature as the “random group method” for assessing variances of estimates in complex sample surveys. This approach, which has many variants, has literature that goes back to at least 1939 (see Wolter, 1985). It is closely related to another method called the “Jackknife” which was introduced in 1949 and popularized in the 1960s. The essence of the random group or the Jackknife method is to calculate a numerical quantity of interest on a smaller part of the whole data set, and to do this for several such smaller parts of the original data. The differences between the results from these smaller parts of the data are combined to assess the amount of variability, computed on the whole data set. The random halves method is an example of this in which the smaller parts of the whole data are random halves of the data.

The Random Halves method is applied as follows: A random sample of N/2 of the rows of R is made without replacement, meaning that a row cannot be selected twice. The mean j for each program is then computed from this random half sample of the full data. All the programs are then ranked on a basis of these mean ratings. This procedure could be repeated ten, one hundred, or several hundred times to produce a range of ratings and rankings for each program in the field. Rankings for each program could be summarized as the distribution that lies within the interquartile range of the ratings. Users of reputational ratings would recognize that raters rate programs differently and half of the raters ranked program j from a to b, where a is the 25 percentile of its ranking distribution and b is the 75 percentile.

Bootstrap Method: The Bootstrap method was developed more recently than the random group method, and its literature only dates back to 1979. It is well described in Efron (1982) and Efron and Tibshirani (1993). Although the Bootstrap method was not created specifically for assessing variances in complex sample surveys, it has been used for that purpose. It was created as a general method for assessing the variability of the results of any type of data analysis, complex or simple, and has become a standard tool. Instead of sampling N/2 rows of R without replacement, N of the rows would be sampled from R with replacement, meaning that a row could be selected several times. The same procedure could be used for computing the mean, as in the Random Halves method.

The two methods provide very similar results. The perceived advantage in the Random Halves method is in the process, where a rater pool is selected and half the raters are sent questionnaires. This rating process is repeated again and again for the original pool. It is not significantly different from what was done in the past, when the selection of raters and the use of a confidence interval show that a certain percentage of the ratings would fall in a similar interval even if a different set of raters were selected. The advantage in the Bootstrap method, on the other hand, is an established method with a developed theory for the statistical accuracy of survey measurements.

A Comparison of the Random Halves and Bootstrap Methods

The differences between the methods can be demonstrated by the following simple example. Consider an example where three raters rate two programs. The raters are labeled 1, 2 and 3, and the two programs are labeled A and B. The Rating Matrix is:

Table 1. The Rating Matrix.

Table 1

The Rating Matrix.

In this example, all three raters rate the same two programs on a scale of 0 to 2. In turning ratings into rankings, assume that lower ratings correspond to assessments of higher quality. Thus, rater 1 rated A higher than B, by giving A a rating of 0 and B a rating of 1. The last row of the Rating Matrix has the average ratings for each program. For these ratings, B is ranked higher than A because its average rating is slightly lower than that of A. In the discussion of the example, the rank of A, will be denoted by Rank(A). Therefore, Rank(A)=2, while Rank(B)=1.

This example may appear to be unrealistic in at least two ways. First, it is very small. This means that it is only possible to examine the probability that A is ranked 1st or 2nd. Second, programs are not sampled for raters to rate, instead, the raters rate all of the programs in the example. However, neither of these simplifications is very important for the things that will be demonstrated by the example. On the other hand, the example shows some differences among ratings of the three raters. Rater 1 ranks A and B differently from the way Raters 2 and 3 do. Also the second raters rating numbers are higher than the other two.

In applying Random Halves (RH) to this example, there are two variations, since the number of responses is not an even number. Hence, denote by RH(1) the “half-sample” consists of 1 of the 3 raters chosen at random, and in RH(2) the “half-sample” consists of 2 of the 3 raters chosen at random. These are the only possibilities for the RH method in the example.

In the RH(1) case, since there are three possible raters to be sampled, they are each sampled with probability 1/3, and that the averages are the rating. Below is a table that summarizes the three possible sample results for RH(1).

Because Rank(A)=2 in two of the three possible half samples, the probability that Rank(A)=2 is 2/3. This should be compared to the finding that in the data (i.e., the Rating Matrix on Table 1) the Rank of A is 2, so the RH(1) method indicates that it could have been different from 2 about 1/3 of the time.

In the RH(2) case, two raters are sampled, and there are three possibilities {1,2}, {1,3} and {2,3}. Suppose the two sampled raters are 1 and 2. Then the data to be averaged are given in the following table. The table below summarizes what occurs for three possible half samples for RH(2). Note that in the cases, where the average ratings are the same, random tie splitting is used and the rank order is denoted by 1.5.

Table 3. Summary of RH(2).

Table 3

Summary of RH(2).

In the case of RH(2), there are three ways to get the probability that Rank(A)=2. The first is from sample {2,3}. The other two ways are either one of two other samples and have the tie split so that Rank (A)=2. Hence, the probability is 1/3+(1/3)(1/2)+ (1/3)(1/2)=2/3. The fraction, 1/2, represents the tie splitting. Note that 2/3 is also the probability for Rank(A)=2 in RH(1).

In summary, the RH method calls for repeatedly taking “half-samples” of the rating matrix, averaging the resulting ratings for A and B, and then ranking A and B based on these average ratings. In resampling over and over, a distribution is constructed of how many times A is ranked 1 or 2. For example, in the case of RH(1) or RH(2), A would be ranked 2 about 2/3rds of the time. Therefore, while the two versions of the RH method give different data, using random tie splitting gives the same results for the probability that A is ranked 2.

Applying the Bootstrap (Boot) method to the example, three raters were sampled, and the same rater could be selected more than once. They were regarded as representative of all the possible raters who could have been sampled to rate the programs. Clearly such an assumption varies in plausibility due to various factors, such as how many raters are being considered and how they are originally chosen. It is, however, a useful assumption and appears throughout many applications of statistics.

In sampling three rows from the original Rating Matrix there are 27 possible combinations or the probability of any sample is 1/27. They are listed in the following table.

Table 4. Bootstrap samples, their average ratings for A and B and the Rank of A.

Table 4

Bootstrap samples, their average ratings for A and B and the Rank of A.

Rank(A)=2 occurs a total of 20 times in the table above, yielding a probability of 20/27 =.74. This is different from the results of the RH methods (i.e., .67). However, it is still plausible because while A was ranked second in a sample of 3, there is still some probability that it could have been ranked 1 in a different sample of raters. The Boot method produced a somewhat smaller probability estimate, i.e., .26 rather than .33, so that A could have been ranked 1st, but both of these values are less than 1/2 and, are both plausible in such a small example.

There is no very convincing, intuitive way to favor either one of these two probability estimates, .67 or .74. Hence, this example has little to offer in making an intuitive choice between the two approaches. What this does show is that the RH and Boot methods do not give the same results for something that is closely related to the types of probabilities. Thus, any claim that the two methods are “equivalent” is wrong, but they are clearly “similar.”

Statisticians who are specialists in variance estimation prefer the Bootstrap to ad hoc methods because it is grounded in theory. The Bootstrap method is the nonparametric, maximum likelihood estimate of the probability that Rank(A)=2. The Random Halves method does not enjoy this property. However, variance estimation is an important subject in statistics and many methods, in particular the Jackknife, can be tailored to situations where they provide serious competition to the Bootstrap. The next section will illustrate that, when the number of raters and programs are both large, there is little difference between the Random Halves and the Bootstrap methods.

Analysis of the Expected Variance for the Two Methods

A natural question to ask is: What do the Random Halves and Boot methods produce for probability distributions of average ratings for programs? Drawing on some results from probability theory it can be shown that these methods give similar results.

Any method of resampling creates random variables with distributions that depend on the resampling method. In the rating example, let the random variables for the average ratings that result for A and B for each sample be denoted by RA and RB, respectively. These are random variables with means and variances that have well-known values. The average ratings of A and B in the rating matrix are given in the last row of The Rating Matrix in Table 1, and they are denoted in general as rA and rB. Thus, in the example, rA =1 and rB=2/3. In addition to the average ratings, the variance of the ratings in each column is defined as the average of the squares of the ratings in each column minus the square of the mean rating for that column. Thus, for program A, the variance is

VA=(02 +22 +12)/3−12=5/3−1=2/3,

and, for program B, it is

VB=(12 +12 +02)/3−(2/3)2=2/3−4/9=6/9−4/9=2/9.

Table 5 gives the results for N raters rating Program A and n raters used in the RH(n) method. If N is even, then n=N/2. In the table let E(RA) denote the “expected value” or “long-run average value” of the average rating for A, RA. Statistics show that it is the same value, rA, for both the Boot and the RH methods. rA is also the average rating for A in the original Rating Matrix, and in general, rA is the average rating given to program A by the raters rating it. Thus, both the RH and Boot methods are unbiased for rA, and any sensible resampling method will share this property.

Table 5. The mean and variance of the average rating for A in a single resample.

Table 5

The mean and variance of the average rating for A in a single resample.

Where the two methods can differ is in the value of the variance, Var(RA). This variance is a measure of how much RA deviates on average from the mean value, rA, from one random resampling to another. Observe that both formulas for Var(RA) involve, VA, the variance of the ratings in the column of the Rating Matrix for program A. Note that when N is even, and n=N/2 then the N−n in the numerator for RH(n) is n and it cancels the n in the denominator leaving only N−1 in the denominator. This is to be compared to the N in the denominator for the Bootstrap method. When N, the number of raters is large, then N and N−1 are close and the variances of average rating, RA, for the two methods are nearly the same.

The factor or the right side of the formula for the RH(n) variance is known as the finite sampling correction and it gets smaller as n increases relative to N. In the simple example, here is what these formulas yield.

RH(1): In this case, RA takes on these three possible values with the corresponding probabilities.

Possible average ratings012
Probabilities1/31/31/3

The mean of this distribution is 0(1/3)+1(1/3)+2(1/3)=1=rA.

Its variance is 02(1/3)+12(1/3)+22(1/3)−12 =2/3.

Applying the formula for the variance for RH(1) from Table 5 gives

((2/3)/1)(3−1)/(3−1)=2/3, the same value.

RH(2): In this case, RA takes on these three possible values with the corresponding probabilities.

Possible average ratings1/213/2
Probabilities1/31/31/3

The mean of this distribution is (1/2)(1/3)+1(1/3)+(3/2)(1/3)=1= rA, as before.

Its variance is (1/2)2(1/3)+(1)2(1/3)+(3/2)2(1/3)−12 = ((1/4)+1+(9/4))/3−1=(14/4)73−(12/12)=2/12=1/6.

Applying the formula for the variance for RH(2) from Table 7 gives

((2/3)/2)(3–2)/(3–1)=(1/3)(1/2)=1/6, the same value.

Boot: In this case, RA takes on seven possible values with the corresponding probabilities.

Possible average ratings01/32/314/35/32
Probabilities1/273/276/277/276/273/371/3

These probabilities are found by summing up the Bootstrap samples that yield the given possible value in Table 4. This is a larger set of possible average ratings for A than either one of the RH methods gives. This is due to the richer set of samples available under the Boot method.

The mean of this distribution is (0)(1/27)+(1/3)(3/27)+(2/3)(6/27)+(1)(7/27)+ (4/3)(6/27)+(5/3)(3/27)+(2)(1/27)=1=rA, as it is for the other two methods.

The variance is (0)2(1/27)+(1/3)2(3/27)+(2/3)2(6/27)+(1)2(7/27)+(4/3)2(6/27)+ (5/3)2(3/27)+(2)2(1/27)−12=(1/9)(1/27)(3+24+63+96+75+36)−1= (297/(9×27))−1=(11/9)−(9/9)=2/9.

Applying the formula for the variance for Boot from Table 5 gives ((2/3)/3)=2/9, the same value.

Summary of results

The mean and variance calculations as applied to this simple example illustrates the following:

  1. The RH and Boot methods are only similar when N, the number of raters rating a program, is large enough to make the difference between N and N−1 negligible.
  2. The set of possible samples from which resampling takes place differs for the two methods, the one for method Boot is much larger in general.
  3. Both methods are unbiased for the mean rating of a program, but they differ in their variances. When N is even, the variance of Boot is smaller, when N is odd, the variance of Boot lies between that for RH(n) and RH(n+1) where n<N/2<n+1. This is observed by examining the data in Table 4.
  4. The Boot method usually has a much richer set of possible ratings in its resampling distribution, and fewer ties.

References

  1. Wolter KM. Introduction to Variance Estimation. New York: Springer-Verlag; 1985.
  2. Efron B. The Jackknife, the Bootstrap and other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics; 1982.
  3. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman & Hall; 1993.

Correlates of Reputation Analysis

The reputational quality of a program is a purely subjective measure; however, it is related to quantitative measures in the sense that quality judgment could be made on the basis of information about programs, such as the scholarly work of the faculty and the honors awarded to the faculty for that scholarship. Therefore, it may be possible to relate or to predict quality rankings for programs using quantitative measures. It is clear that predicted quality rankings would also be subjective and that the accuracy of such predictions may change over time.

One way to construct such a relationship is to do a least squares multilinear regression. The dependent variable in the regression analysis is represented by a set of average ratings, r1, r2, . . , rN for N programs in a particular field. The predictors or independent variables would be a set of quantitative or coded program characteristics that are represented by a vector, xn, for program n. The analysis would construct a function f(x) which provides a predicted average rating f(xn) for program n. In this case the relation between rn and f(xn) would be

rn=f(xn)+en=a1x1,n+a2x2,n+…+amxm,n+am+1+en

(1)

where x1,n, x2,n,…, xm,n represent the m quantity or coded characteristics for the program n in the field, and en, is the residual or the amount by which the predicted average rating varies from the actual average rating for that program. If the prediction is “good” then the residuals are relatively small. The coefficients aj are determined by minimizing the sum of the squares of the differences rn−f(xn).

While a single regression equation is generated using quantitative data and the reputational score, the selected raters of the program provide a certain amount of variability. This variability can be shown in the following manner: Associated with each coefficient ai is a 95%-confidence interval [Li, Ui], and by randomly selecting values for the coefficients within their confidence intervals, a predicted average rating ŕn can be generated for program n. A measure of how close the set of ŕn ratings is to the rn ratings can be calculated by

||ŕ-r||2<ps2F,

(2)

where ŕ=(ŕ1, ŕ2,…, ŕN), r=(r1, r2,…, rN) and || ||2 denotes the sum of squares of the components of the difference vector. The bound on the inequality, p s2 F, is a constant that is derived from the regression analysis.

p=m, the number of nonconstant terms in the regression equation, s2 is the “mean square for error” given in the output of a regression program, and F=the 95% cutoff point for the F-distribution with p and n-p degrees of freedom.

By repeating the random selection of coefficients many times, a collection of coefficients can be determined that satisfies inequality (2), and the upper- and lower-bounds of this collection defines an interval [L'i, U'i]. For coefficients in these intervals a range of predicted ratings can be generated.

From the practical point of a program trying to estimate the quality of its program, a few years after a reputational survey is conducted, it could use a linear regression equation with coefficients in [L'i, U'i] to generate a new range of ratings based on current program data, or if data for all programs in the field were available, a new interquartile ranking of programs could be obtained.

The following is an example where this method is applied to the 1995 ratings of programs in Mathematics.

Mathematics

Using the STATA statistical package and applying a forward stepwise, least-squares linear regression on a large number of quantitative variables which characterized publications, citations, faculty size and rank, research grant support, number of doctorates by gender and race/ethnicity, graduate students by gender, graduate student support, and time to degree, the following seven variables were identified as being the most significant:

(ginipub)Gini Coefficient for Program Publications, 1988–92: The Gini coefficient is an indicator of the concentration of publications on a small number of the program faculty during the period 1988–92.
(phds)Total Number of Doctorates FY 86–92
(perfull)Percentage of Full Professors Participating in the Program
(persupp)Percentage of Program Faculty with Research Support (1986–92)
(perfpub)Percentage of Program Faculty Publishing in the Period 1988–1992
(ratiocit)Ratio of the Total Number of Program Citations in the Period 1988–1992 to the Number of Program Faculty
(myd)Median Time Lapse from Entering Graduate School to Receipt of Ph.D. in Years

Results of a regression analysis are shown below. About 95% of the variation is explained by these variables, where R2=0.8304.

SourceSSdfMSNumber of obs=139
F(7, 131)=91.60
Prob>F=0.0000
R-squared=0.8304
Adj R-squared=0.8213
Root MSE=.4186
Model112.36003716.0514329
Residual22.954789131.175227397
Total135.314819138.98054217
qualityCoef.Std. Err.tP>|t|[95% Conf. Interval]
phds.3489197.05446656.410.000.2411721.4566674
perfull.008572.00278643.080.003.0030598.0140842
persupp.0183162.00251467.280.000.0133418.0232906
perfpub−.0150464.0035235−4.270.000−.0220167−.0080762
ratiocit.0258671.00771983.350.001.0105955.0411387
myd−.7737551.1995707−3.880.000−1.168553−.3789567
ginipub−.0294944.0044222−6.670.000−.0382425−.0207462
_cons3.070145.36256348.470.0002.3529083.787382

The resulting predictor equation is:

f(x)=3.07+0.349(phds)+0.009(perfull)+0.018(persupp)−0.015(perfpub)+ 0.026(ratiocit)−.774(myd)−0.029(ginipub)

It is noted that the Root Mean Square Error (RMSE) from the regression is 0.4186, and the variation in scores from the 1995 confidence interval calculation has an RMSE of 0.2277.

The following is scatter plot of the actual 1995 ratings and the predicted ratings.

Image p200088e7g148001

Plot of the Predicted Faculty Quality Score Against the Actual 1995 Score for Programs in Mathematics

The 95%-confidence interval for each of the variables used in the regression can now be used to find a new estimate for the quality score. As described above, values for the coefficients in the regression equation are randomly selected in the intervals and tested to see if that set of coefficients satisfies the relation ||ŕr||2<p s2 F. For Mathematics data the bound p s2 F=(7)(.4186)2(2.12)=2.563556. For this example 3,000 random selections were made in the coefficient intervals and 220 coefficients sets satisfied the inequality. The corresponding maximum and minimum interval are:

phds coefficientpersupp coefficientginipub coefficientmyd coefficientperfpub coefficientratiocit coefficientperfull coefficientconstant
Max0.354690.018583−0.029026−0.7526−0.0146730.0266860.00886743.10858
Min0.343140.018049−0.029964−0.79495−0.0154210.0250470.00827613.03164

Using the values in the above table, the maximum and minimum predicted quality scores can be calculated, and the scores for Mathematics programs are displayed in the table below.

As described earlier, these maximum and minimum coefficient values could be used to construct new quality scores, by randomly selecting the coefficients in the regression equation between the corresponding maximum and minimum values. If this is done repeatedly a collection of quality scores is obtained for each program and the interquartile range of this collection could be generated. This was done 100 times and the results are given as the Predicted Ranks in the table with the Bootstrap rankings.

Quality Score Predicted Ranks Bootstrap Ranks
Institution Maximum Minimum 1st Quartile 3rd Quartile 1st Quartile 3rd Quartile
Dartmouth College2.732.5173765362
Boston University2.702.4277804852
Brandeis University3.172.8849513236
Harvard University4.414.098924
Massachusetts Inst of Technology5.274.932234
U of Massachusetts at Amherst3.403.1138405460
Northeastern University2.412.13991037080
Brown University4.604.31562629
Brown University-Applied Math4.594.26661417
University of Rhode Island1.691.40128129122125
University of Connecticut2.662.39798398102
Wesleyan University2.312.09104107101110
Yale University3.383.13384078
Adelphi University1.070.82138138130133
CUNY—Grad Sch & Univ Center3.383.1040413032
Clarkson University2.492.219094109118
Columbia University4.323.9911111012
Cornell University4.814.46341416
New York University4.834.503478
Polytechnic University2.151.8811211498105
Rensselaer Polytechnic Inst3.643.3627304852
University of Rochester3.102.8352545662
State Univ of New York-Albany2.552.3385888290
State Univ of New York-Binghamton2.552.3385876575
State Univ of New York-Buffalo3.002.7657596170
State Univ of New York-Stony Brook3.603.3130321922
Syracuse University2.422.18951007684
Princeton University4.524.217723
Rutgers State Univ-New Brunswick4.063.7716181720
Stevens Inst of Technology1.731.48127127121128
Carnegie Mellon University3.633.3328313440

English Language and Literature

Applying the same method to the 1995 programs in English Language and Literature, a slightly different result is obtained, since programs in this field do not have the same productivity characteristics as those in Mathematics. Again, forward stepwise least squares linear regression was applied to a large number of quantitative variables, and the following were identified as being the most significant:

(nopubs2)Number of Publications During the Period 1985–1992
(perfawd)Percentage of Program Faculty with at Least One Honor or Award for the Period 1986–1992
(acadplan)Total Number of Doctorates FY 1986–1992 with academic employment plans at the 4-year college or university level.
(ginicit)Gini Coefficienticient for Program Citations, 1988–1992: The Gini coefficienticient is an indicator of the concentration of citations on a small number of the program faculty during the period 1988–1992.
(nocits1)Number of Citations During the Period 1981–1992
(fullprof)Percentage of Full Professors Participating in the Program
(empplan)Total Number of Doctorates FY 1986–1992 with Employment Plans.

None of the variables identified in the Mathematics regression are present in this regression analysis.

Results of this regression analysis are shown below. About 95% of the variation is explained by these variables, where R2=0.8106.

SourceSSdfMSNumber of obs=117
F(7, 109)=66.65
Prob>F=0.0000
R-squared=0.8106
Adj R-squared=0.7984
Root MSE=.42429
Model83.985691711.9979559
Residual 19.6227839109.18002554
Total103.608475116.893176507
q93aCoef.Std. Err.tP>|t|[95% Conf. Interval]
nopubs2.1202936.10177531.180.240−.0814218.322009
perfawd.0326877.00414237.890.000.0244777.0408977
acadplan−.7961931.24164673.290.001.31725731.275129
ginicit−.0007486.0001839−4.070.000−.001113−.0003842
nocits1−.0827859.02342723.530.001.036354.1292178
fullprof−.2942413.10964542.680.008.0769276.511555
empplan−.599897.2698761−2.220.028−1.134783−.0650113
_cons 1.955276.153396812.750.0001.6512492.259304

The resulting predictor equation is:

f(x)=1.955+0.12(nopubs2)+0.033(perfawd)+0.796(acadplan) −0.001(ginicit)+0.083(nocits1)+0.294(fullprof)−0.6(emppplan).

The following is a scatter plot of the Random Halves draw from the 1995 rankings and the predicted ranking for that draw.

For programs in English Language and Literature, the Root Mean Square Error (RMSE) from the regression is 0.42429, and the variation in scores from the 1995 confidence interval calculation has an RMSE of 0.2544.

Image p200088e7g151001

Plot of the Predicted Faculty Quality Score Against the Actual 1995 Score for Programs in English Language and Literature

In Mathematics the 95%-confidence interval for each of the variables used in the regression can be used to determine a new estimate for the quality score. In this case, the bound p s2 F=(7)(.42869)2(2.18)=2.747136. For this example 3,000 random selections were also made in the coefficient intervals and 242 coefficients sets satisfied the inequality. The corresponding maximum and minimum intervals are:

nopubs2 coefficientperfawd coefficientacadplan coefficientginicit coefficientnocits coefficientfullprof coefficientempplan coefficientconstant
Max0.133840.0332390.82835−0.000720.0859030.30883−0.563991.97569
Min0.106840.032140.76425−0.000770.0796890.27975−0.635571.935

For the example used with Mathematics programs, the maximum and minimum values for the coefficients can be used to calculate the maximum and minimum predicted quality scores for the programs in English Language and Literature. These scores are displayed in the table below.

Repeating the exercise, described for Mathematics, of randomly selecting coefficient values in the maximum-minimum intervals a large number of times, an interquartile range can be generated for programs in English Language and Literature. This was again done 100 times and the results are given as the Predicted Ranks in the table with the Random Halves rankings.

Quality Score Predicted Ranks Random Halves Ranks
Institution Maximum Maximum 1st Quartile 3rd Quartile 1st Quartile 3rd Quartile
University of New Hampshire2.742.5691937077
Boston College2.572.4296985964
Boston University3.803.5920213842
Brandeis University3.633.4019214455
Harvard University5.555.051123
U of Massachusetts at Amherst3.843.5130343843
Tufts University2.352.221081106774
Brown University4.213.7815161315
University of Rhode Island2.392.2211311594113
University of Connecticut3.263.0553577987
Yale University5.074.525623
CUNY—Grad Sch & Univ Center3.503.2142481819
Columbia University4.904.2491079
Cornell University4.714.16131368
St John's University1.931.86127127119122
Fordham University2.382.23103106104112
New York University3.593.2526281820
Drew University2.302.15116119123126
University of Rochester3.303.0230334448
State Univ of New York-Binghamton3.012.7262646569
State Univ of New York-Buffalo3.653.1630372527
State U of New York-StonyBrook3.172.7748554652
Syracuse University2.532.3895987176
Indiana Univ of Pennsylvania2.191.93124126122124
Princeton University4.824.39561214
Rutgers State Univ-New Brunswick3.963.6222231618
Carnegie Mellon University3.173.0133355254
Copyright © 2003, National Academy of Sciences.
Bookshelf ID: NBK43469
PubReader format: click here to try

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (7.5M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...