The Committee to Examine the Methodology to Assess Research-Doctorate Programs was presented with the task of examining the methodologies used in the 1995 National Research Council Study, Research-Doctorate Programs in the United States: Continuity and Change (referred to throughout this report as the “1995 Study”) to determine the feasibility of significant improvements. The previous chapters have made specific recommendations on how to conduct an assessment of research-doctorate programs under the assumption that it will be done. The more fundamental question remains to be addressed: Should another study be carried out at all? This chapter presents the Committee's conclusions on this and other general issues along with the reasons for supporting them.
SHOULD ANOTHER ASSESSMENT OF RESEARCH-DOCTORATE PROGRAMS BE UNDERTAKEN BY THE NATIONAL RESEARCH COUNCIL?
The Committee was asked to examine the methodology of the 1995 Study and to identify both its strengths and its weaknesses. Where weaknesses were found, it was asked to suggest methods to remedy them.
The strengths of the 1995 Study identified by the Committee were:
- Wide acceptance. It was widely accepted, quoted, and utilized as an authoritative source of information on the quality of doctoral programs.
- Comprehensiveness. It covered 41 of the largest fields of doctoral study.
- Transparency. Its methodology was clearly stated.
- Temporal continuity. For most programs, it maintained continuity with the NRC study carried out 10 years earlier.
Finally, it should be noted that the study was a useful tool for doctoral programs to improve themselves and hence to improve doctoral education. A frequent use of the study by administrators is to examine the characteristics of programs that are rated more highly than their own. If the study is carried out again, it would provide the quantitative basis for such analyses.
The weaknesses were:
- Data presentation. The emphasis on exact numerical rankings encouraged users of the study to draw spurious inferences of precision.
- Flawed measurement of educational quality. The reputational measure of program effectiveness in graduate education, derived from a question asked of faculty raters, confounded research reputation and educational quality.
- Emphasis on the reputational measure of scholarly quality. This emphasis gave users the impression that a “soft” criterion, subject to “halo” and “size effects,” was being relied on for the assessment of programs.
- Obsolescence of data. The period of 10 years between studies was viewed as too long.
- Poor dissemination of results. The presentation of the study data was in a form that was difficult for potential students to use since it was inaccessible and difficult to interpret.
- Use of an outdated or inappropriate taxonomy of fields. Particularly for the biological sciences, the taxonomy did not reflect the current organization of graduate programs in many institutions.
- Inadequate validation of data. Data were not sent back to providers for a check on accuracy, and some unnecessary errors were propagated.
The weaknesses listed above were addressed in earlier chapters, but in addition to these difficulties, it must be noted that assessments of research-doctorate programs are costly in the direct costs of staff and committee time, but far greater and invisible costs are incurred by university faculty and administrative personnel in amassing data for inclusion in the study. The benefits of the NRC study must outweigh these costs if it is to be undertaken.
One other issue to be addressed is the possibility of duplicative studies. Broad rankings of doctoral programs in some fields are conducted periodically by US News and World Report (USN&WR). Unless the NRC study differs in important respects from that study, there seems little reason to incur the known costs.
Both the USN&WR and the NRC reports publish reputational rankings, but the resemblance ends there. USN&WR rankings appear with somewhat greater frequency, and they cover a more limited set of fields outside of professional schools. With the exception of engineering, USN&WR publishes only reputational rankings (as of their 2004 edition). Quantitative data are collected for engineering, but USN&WR employs a weighted average of quantitative data and reputational ratings to arrive at a composite ranking. The problem with this approach is that any ranking based on weighted averages of quantitative indicators is necessarily subjective. They represent an implementation of someone's prejudices regarding the relative importance of the various indicators.
There are additional technical objections to the USN&WR rankings. For those fields for which there is overlap with the NRC fields, the response rates for USN&WR were 10–20 percentage points below those obtained for the 1995 NRC report. Even more importantly, USN&WR targets administrators as respondents and asks their views of programs in fields outside their area of expertise. The NRC makes every effort to obtain ratings from within-field peers who are primarily faculty.
The differences between the two studies also reflect a difference in audience. USN&WR is aimed directly at the potential student and purports to contain material that would be helpful to students applying to graduate school. The 1995 NRC Study was primarily directed to faculty, administrators, and scholars of higher education. It was not especially user-friendly. In fact, Brendan Maher, a co-author of the 1995 Study, subsequently wrote a guide for students and others.1
Because of the transparent way in which NRC studies present their data, the more extensive coverage of fields outside of professional schools, their focus on peer ratings, and the relatively high response rates they obtain, there is clearly value added in having the NRC conduct the assessment once again. However, two questions still remain: Do reputational ratings do more harm than good to the enterprise that they seek to assess? And, does the fact that ratings are published by a prestigious organization, such as the NRC, lend more credence to rankings than should be due?
Ratings would be harmful if they gave a distorted view of the graduate enterprise or if they encouraged behavior inimical to improving its quality. The Committee believes that a number of steps recommended in previous chapters would minimize these risks. Presenting ratings as ranges would diminish the focus of some administrators on hiring decisions designed purely to “move up in the rankings.” Ascertaining whether programs track student outcomes would encourage programs to pay more attention to improving student outcomes. Asking students about the education they have received would encourage programs to focus on graduate education as well as on research. Expanding the set of quantitative measures would permit deeper investigations into components of a program that contribute to a reputation for quality. More frequent updating of these data would provide more timely and objective assessments. A careful analysis of the correlates of reputation would improve public understanding of the factors that contribute to a highly reputed graduate program.
Recommendation 1: The assessment of both the scholarly quality of doctoral programs and the educational practices of these programs is important to higher education, its funders, its students, and to society. The National Research Council should continue to conduct such assessments on a regular basis.
One of the major objections to previous NRC studies is that they are performed only every 10 years. The reason for this is a practical one. A national survey of graduate faculty is an enormous undertaking and changes in scholarly quality occur slowly. Little new information would be gained at a high cost if faculty were to be questioned frequently about a slowly changing phenomenon. The ability to gather quantitative data electronically at little cost, however, makes possible more frequent reporting of quantitative data. We will attempt to produce periodically and, ideally, annually updatable proxy assessments based on quantitative information. The Committee believes that Web-based data gathering should be a part of the next study and suggests the establishment of an updateable database on graduate programs. Further, once a statistical analysis of the relationship between quantitative measures and the reputational measure has been conducted for each field, it will be possible to construct a “synthetic reputational measure,” constructed under the assumption that the parameters that relate the quantitative measures to reputation have held steady over time, but that the values themselves have changed. Although the measure is weighted, the weights are not subjective except in the sense they will be statistically determined and the combination of measures that provide the best fit will be used to construct the indicator for subsequent years. The measures and their parameters are then frozen in time, although the values of the measures may change.
Recommendation 2: Although scholarly reputation and the composition of program faculty change slowly and can be assessed over a decade, quantitative indicators that are related to quality may change more rapidly and should be updated on a regular and more frequent basis than scholarly reputation. The Committee recommends investigation of the construction of a synthetic measure of reputation for each field, based on statistically derived combinations of quantitative measures. This synthetic measure could be recalculated periodically and, if possible, annually.
As described in Chapter 6, reputational rankings depend on the dispersion of the aggregated ratings of many raters. This dispersion is relatively narrow for the very best programs but increases for other programs simply because information about such programs is not as widely known. A number of factors may contribute to this phenomenon—lack of rater knowledge about the program, the likelihood that smaller programs may specialize in some subfields but not others, and the fact that different raters value different dimensions of program quality when they assign ratings.
Although it may greatly disappoint those programs which would like to boast about their place in the ratings, the Committee believes that presenting ratings in a way that portrays dispersion (or lack of rater agreement about the exact ranking) would improve the usefulness of the ratings.
Recommendation 3: The presentation of reputational ratings should be modified so as to minimize the drawing of a spurious inference of precision in program ranking.
In addition to the quantitative measures collected in the 1995 Study, additional measures would add to the ability of study users to analyze the correlates of reputation. These are discussed in detail in Chapter 4, but include data on electronic acquisitions by libraries and field-specific measures, such as laboratory space in the sciences, and number of books in the humanities.
Recommendation 4: Data for quantitative measures should be collected regularly and made accessible in a Web-readable format. These measures should be reported whenever significantly updated data are available. (See Recommendation 4.1 for details.)
The education of doctoral students for a wide range of employment beyond that in academia has become an object of growing attention in the educational policy community and among the students themselves. In addition to collecting data on educational practices and resources, the Committee proposes that the next NRC study collect data from advanced-to-candidacy students in a small number of fields in order to assesses their educational experiences, their research productivity, program practices, and institutional and program environments. Further, although the Committee realizes that it would not be feasible to conduct a large study of outcomes, it believes that information on whether programs collect and publish such information would be valuable to potential students.
Recommendation 5: Comparable information on educational processes should be collected directly from advanced-to-candidacy students in selected programs and reported. Whether or not individual programs monitor outcomes for their graduates should be reported.
The Committee constructed a taxonomy of fields for the proposed study that reflected changes that have taken place in the past decade, especially in the biological sciences. Although it was not able to identify many interdisciplinary fields that offered doctoral programs, it did recommend a new category that would present data on such fields as they emerged. Many such fields may still be included in more traditional programs. The committee appointed to conduct the proposed study should consider the exact details of the taxonomy. This is an open question, still subject to review.
Recommendation 6: The taxonomy of fields should be changed from that used in the 1995 Study to incorporate additional fields with large Ph.D. production. The agricultural sciences should be added to the taxonomy and efforts should be made to include basic biomedical fields in medical schools. A new category, “emerging fields,” should be included.
In the 1995 Study, data were not send back to the providers for validation. This introduced a number of errors. For example, for multicampus institution whole programs were omitted and a number of faculty lists were inaccurate. The next study should make sure this does not happen. This is made much more feasible by the availability of information technology.
Recommendation 7: All data that are collected should be validated by the providers.
There is an increasing trans-border flow of doctoral students between Canadian and U.S. doctoral programs. Although there are differences between the national systems, there are many similarities as well. The Committee believes that the inclusion of Canadian research-doctorate programs would be useful to programs in both countries.
Recommendation 8: If the recommendation of the Canadian Research-Doctorate Quality Assessment Study, which is currently underway, is to participate in the proposed NRC study, Canadian doctoral programs should be included in the next NRC assessment.
The past decade has seen enormous strides in information technology. It is now feasible, as demonstrated by the pilot trials, to collect data using Web questionnaires. This is a cost-effective technology, saving not only postage but also the time of coders and permitting rapid validation of data. Electronic technology can and should also play an important role in the dissemination of the report. Databases can be made available on-line, as can simple analytic software that would enable users to select peer institutions as well as conduct comparative analyses, while maintaining rater confidentiality. The database for the proposed study should be designed with this sort of dissemination in mind.
Recommendation 9: Extensive use of electronic Web-based means of dissemination should be utilized for both the initial report and periodic updates (cf. Recommendations 2 and 4).
THE FORM OF THE PROPOSED STUDY
The 1995 Study was disseminated as a book of 740 pages, 64 pages of which comprised the text. The remaining pages contained tables of data and rankings. The bulky study was also made available on the Web. Two years later, a CD was published with these data and supplemental data on the ratings of raters. Electronic technology now makes it possible to immediately publish all the data, aggregated to preserve rater confidentiality, on the Web. The same technology makes it possible for data from the next study to be pre-released to designated researchers for analytic studies and for those studies to be published as the print “report” of the study. Furthermore, a Web-based release makes it possible to provide analytical tools to users so that they can compose and rate programs using à la carte quantitative weights of their own choosing. The Committee believes strongly that publication of the data alone, without an exploration of its strengths and limitations, should not happen again. The funding of analytic work should be built into the study and appear as a prominent part of the report.
Finally, since the report will have considerably more information of interest to students, it would be very helpful to include as an integral part of the report a section, entitled How to Read This Report, similar to the guide written by Brendan Maher in 1996.
“How to Read the 1995 National Research Council Report Research-Doctorate Programs in the United States.” 1996.
National Academies Press (US), Washington (DC)
National Research Council (US) Committee to Examine the Methodology for the Assessment of Research-Doctorate Programs; Ostriker JP, Kuh CV, editors. Assessing Research-Doctorate Programs: A Methodology Study. Washington (DC): National Academies Press (US); 2003. 7, General Conclusions and Recommendations.