NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Committee to Examine the Methodology for the Assessment of Research-Doctorate Programs; Ostriker JP, Kuh CV, editors. Assessing Research-Doctorate Programs: A Methodology Study. Washington (DC): National Academies Press (US); 2003.

Cover of Assessing Research-Doctorate Programs

Assessing Research-Doctorate Programs: A Methodology Study.

Show details

2How the Study Was Conducted


In many ways, the completion of the 1995 Study led immediately into the study of the methodology for the next one. In the period between October of 1995, when the 1995 assessment was released, and 1999, when a planning meeting for the current study was held, Change magazine published an issue containing two articles on the NRC rankings—one by Webster and Skinner (1996) and another by Ehrenberg and Hurst (1996). In 1997, Hugh Graham and Nancy Diamond argued in their book, The Rise of American Research Universities, that standard methods of assessing institutional performance, including the NRC assessments, obscured the dynamics of institutional improvement because of the importance of size in determining reputation. In the June 1999 Chronicle of Higher Education,1 the criticism was expanded to include questioning the ability of raters to perform their task in a scholarly world that is increasingly specialized and often interdisciplinary. They recommended that in its next study the NRC should list ratings of programs alphabetically and give key quantitative indicators equal prominence alongside the reputational indicators.

The taxonomy of the study was also immediately controversial. The study itself mentioned the difficulty of defining fields for the biological sciences and the problems that some institutions had with the final taxonomy. The 1995 taxonomy left out research programs in schools of agriculture altogether. The coverage of programs in the basic biomedical sciences that were housed in medical schools was also spotty. A planning meeting to consider a separate study for the agricultural sciences was held in 1996, but when funding could not be found, it was decided to wait until the next large assessment to include these fields.

Analytical studies were also conducted by a number of scholars to examine the relationship between quantitative and qualitative reputational measures.2 These studies found a strong statistical correlation between the reputational measures of scholarly quality of faculty and many of the quantitative measures for all the selected programs.

The Planning Meeting for the next study was held in June of 1999. Its agenda and participants are shown in Appendix C. As part of the background for that meeting, all the institutions that participated in the 1995 Study were invited to comment and suggest ways to improve the NRC assessment. There was general agreement among meeting participants and institutional commentators that a statement of purpose was needed for the next study that would identify both the intended users and the uses of the study. Other suggested changes were to:

  • Attack the question of identifying interdisciplinary and emerging fields and revisit the taxonomy for the biological sciences,
  • Make an effort to measure educational process and outcomes directly,
  • Recognize that the mission of many programs went beyond training Ph.D.s to take up academic positions,
  • Provide quantitative measures that recognize differences by field in measures of merit,
  • Analyze how program size influences reputation,
  • Emphasize a rating scheme rather than numerical rankings, and
  • Validate the collected data.

In the summer following the Planning Meeting, the presidents of the Conference Board of Associated Research Councils and the presidents of three organizations, representing graduate schools and research universities,3 met and discussed whether another assessment of research-doctorate programs should be conducted. Objections to doing a study arose from the view that graduate education was a highly complex enterprise and that rankings could only over-simplify that complexity; however, there was general agreement that, if the study were to be conducted again, a careful examination of the methodology should be undertaken first. The following statement of purpose for an assessment study was drafted:

The purpose of an assessment is to provide common data, collected under common definitions, which permit comparisons among doctoral programs. Such comparisons assist funders and university administrators in program evaluation and are useful to students in graduate program selection. They also provide evidence to external constituencies that graduate programs value excellence and assist in efforts to assess it. More fundamentally, the study provides an opportunity to document how doctoral education has changed but how important it remains to our society and economy.

The next 2 years were spent discussing the value of the methodology study with potential funders and refining its aims through interactions with foundations, university administrators and faculty, and government agencies. A list of those consulted is provided in Appendix B. A teleconference about statistical issues was held in September 2000,4 and it concluded with a recommendation that the next assessment study include careful work on the analytic issues that had not been addressed in the 1995 Study. These issues included:

  • Investigating ways of data presentation that would not overemphasize small differences in average ratings.
  • Gaining better understanding of the correlates of reputation.
  • Exploring the effect of providing additional information to raters.
  • Increasing the amount of quantitative data included in the study so as to make it more useful to researchers.

A useful study had been prepared for the 2000 teleconference by Jane Junn and Rachelle Brooks, who were assisting the Association of American Universities' (AAU) project on Assessing Quality of University Education and Research. The study analyzed a number of quantitative measures related to reputational measures. Junn and Brooks made recommendations for methodological explorations in the next NRC study with suggestions for secondary analysis of data from the 1995 Study, including the following:

  • Faculty should be asked about a smaller number of programs (less than 50).
  • Respondents should rate departments 1) in the area or subfield they consider to be their own specialization and then 2) separately for that department as a whole.
  • The study should consider using an electronic method of administration rather than a paper-and-pencil survey.5

Another useful critique was provided in a position paper for the National Association of State Universities and Land Grant Colleges by Joan Lorden and Lawrence Martin6 that resulted from the summer 1999 meeting of the Council on Research Policy and Graduate Education. This paper recommended that:

  • Rating be emphasized, not reputational ranking,
  • Broad categories be used in ratings,
  • Per capita measures of faculty productivity be given more prominence and that the number of measures be expanded,
  • Educational effectiveness be measured directly by data on the placement of program graduates and a “graduate's own assessment of their educational experiences five years out.”


The Committee to Examine the Methodology for the Assessment of Research-Doctorate Programs of the NRC held its first meeting in April 2002. Chaired by Professor Jeremiah Ostriker, the Committee decided to conduct its work by forming four panels whose membership would consist of both committee members and nonmembers who could supplement the committee's expertise.7 The panels were comprised of both committee members and outside experts and their tasks were the following:

Panel on Taxonomy and Interdisciplinary

This panel was given the task of examining the taxonomies that have been used in past studies, identifying fields that should be incorporated into the study, and determining ways to describe programs across the spectrum of academic institutions. It attempted to incorporate interdisciplinary programs and emerging fields into the study. Its specific tasks were to:

  • Develop criteria to include/exclude fields.
  • Determine ways to recognize subfields within major fields.
  • Identify faculty associated with a program.
  • Determine issues that are specific to broad fields: agricultural sciences; biological sciences; arts and humanities; social and behavioral sciences; physical sciences, mathematics, and engineering.
  • Identify interdisciplinary fields.
  • Identify emerging fields and determine how much information should be included.
  • Decide on how fields with a small number of degrees and programs could be aggregated.

Panel on the Review of Quantitative Measures

The task of this panel was to identify measures of scholarly productivity, educational environment, and characteristics of students and faculty. In addition, it explored effective methods for data collection. The following issues were also addressed:

  • Identification of scholarly productivity measures using publication and citation data, and the fields for which the measures are appropriate.
  • Identification of measures that relate scholarly productivity to research funding data, and the investigation of sources for these data.
  • Appropriate use of data on fellowships, awards, and honors.
  • Appropriate measures of research infrastructure, such as space, library facilities, and computing facilities.
  • Collection and uses of demographic data on faculty and students.
  • Characteristics of the graduate educational environment, such as graduate student support, completion rates, time to degree, and attrition.
  • Measures of scholarly productivity in the arts and humanities.
  • Other quantitative measures and new data sources.

Panel on Student Processes and Outcomes

This panel investigated possible measures of student outcomes and the environment of graduate education. Questions addressed were:

  • What quantitative data can be collected or are already available on student outcomes?
  • What cohorts should be surveyed for information on student outcomes?
  • What kinds of qualitative data can be collected from students currently in doctoral programs?
  • Can currently used surveys on educational process and environment be adapted to this study?
  • What privacy issues might affect data gathering? Could institutions legally provide information on recent graduates?
  • How should a sample population for a survey be identified?
  • What measures might be developed to characterize participation in postdoctoral research programs?

Panel on Reputational Measures and Data Presentation

This panel focused on:

  • A critique of the method for measuring reputation used in the past study.
  • An examination of alternative ways for measuring scholarly reputation.
  • The type of preliminary data that should be collected from institutions and programs that would be the most helpful for linking with other data sources (e.g., citation data) in the compilation of the quantitative measures.
  • The possible incorporation of industrial, governmental, and international respondents into a reputational assessment measure.

In the process of its investigation the panel was to address issues such as:

  • The halo effect.
  • The advantage of large programs and the more prominent use of per capita measures.
  • The extent of rater knowledge about programs.
  • Alternative ways to obtain reputational measures.
  • Accounting for institutional mission.

All panels met twice. At their first meetings, they addressed their charge and developed tentative recommendations for consideration by the full committee. Following committee discussion, the recommendations were revised. The Panel on Quantitative Measures and the Panel on Student Processes and Outcomes developed questionnaires that were fielded in pilot trials. The Panel on Reputational Measures and Data Presentation developed new statistical techniques for presenting data and made suggestions to conduct matrix sampling on reputational measures, in which different raters would receive different amounts of information about the programs they were rating. The Panel on Taxonomy developed a list of fields and subfields and reviewed input from scholarly societies and from those who responded to several versions of a draft taxonomy that were posted on the Web.

Pilot Testing

Eight institutions volunteered to serve as pilot sites for experimental data collection. Since the purpose of the pilot trials was to test the feasibility of obtaining answers to draft questionnaires, the pilot sites were chosen to be as different as possible with respect to size, control, regional location, and whether they were specialized in particular areas of study (engineering in the case of RPI, biosciences in the case of UCSF). The sites and their major characteristics are shown in Table 2–1.

TABLE 2–1. Characteristics for Selected Universities.


Characteristics for Selected Universities.

Coordinators at the pilot sites then worked with their offices of institutional research and their department chairs to review the questionnaires and provide feedback to the NRC staff, who, in turn, revised the questionnaires. The pilot sites then administered them.8

Questionnaires for faculty and students were placed on the Web. Respondents were contacted by e-mail and provided individual passwords in order to access their questionnaires. Institutional and program questionnaires were also available on the Web. Answers to the questionnaires were immediately downloaded into a database. Although there were glitches in the process (e.g., we learned that whenever the e-mail subject line was blank, our messages were discarded as spam), generally speaking, it worked well. Web-administered questionnaires could work, but special follow-up attention9 is critical to ensure adequate response rates (over 70 percent).

Data and observations from the pilot sites were shared with the committee and used to inform its recommendations, which are reported in the following four chapters. Relevant findings from the pilot trials are reported in the appropriate chapters.



Two examples of these studies were: Ehrenberg and Hurst (1998) and Junn and Brooks (2000).


These were: John D'Arms, president, American Council of Learned Societies; Stanley Ikenberry, president, American Council on Education; Craig Calhoun, president, Social Science Research Council; and William Wulf, vice-president, National Research Council. They were joined by: Jules LaPidus, president, Council of Graduate Schools; Nils Hasselmo, president, Association of American Universities; and Peter McGrath, president, National Association of State Universities and Land Grant Colleges.


Participants were: Jonathan Cole, Columbia University; Steven Fienberg, Carnegie-Mellon University; Jane Junn, Rutgers University; Donald Rubin, Harvard University; Robert Solow, Massachusetts Institute of Technology; Rachelle Brooks and John Vaughn, Association of American Universities; Harriet Zuckerman, Mellon Foundation; and NRC staff.


Op. cit., p. 5.


Lorden and Martin (n.d.).


Committee and Panel membership is shown in Appendix A.


Two of the pilot sites, Yale University and University of California-San Francisco, provided feedback on the questionnaires but did not participate in their actual administration.


In the proposed study, the names of non-respondents will be sent to the graduate dean, who will assist the NRC in encouraging responses. Time needs to be allowed for such efforts.

Copyright © 2003, National Academy of Sciences.
Bookshelf ID: NBK43470


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (7.5M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...