NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Committee on an Assessment of Research Doctorate Programs; Ostriker JP, Kuh CV, Voytuk JA, editors. A Data-Based Assessment of Research-Doctorate Programs in the United States. Washington (DC): National Academies Press (US); 2011.

Cover of A Data-Based Assessment of Research-Doctorate Programs in the United States

A Data-Based Assessment of Research-Doctorate Programs in the United States.

Show details


A Data-Based Assessment of Research-Doctorate Programs in the United States provides an unparalleled dataset collected from doctoral institutions, doctoral programs, doctoral faculty and public sources that can be used to assess the quality and effectiveness of doctoral programs based on measures important to faculty, students, administrators, funders, and other stakeholders. The committee collected 20 measures that include characteristics of the faculty, such as their publications, citations, grants, and diversity; characteristics of the students, such as their GRE scores, financial support, publications, and diversity; and characteristics of the program, such as number of Ph.D.’s granted over five years, time to degree, percentage of student completion, and placement of students after graduation. The data were collected for the academic year 2005–2006 from more than 5,000 doctoral programs at 212 universities. These observations span 62 fields, and the research productivity data are based typically on a five-year interval. Some datasets (such as publications and citations) go as far back as 1981. Information on enrollments and faculty size were also collected for 14 emerging fields.

The program-level data, collected using questionnaires, reflect the size, scope, and other components of each program, as well as, financial aid and training practices. In addition, data were collected about time to degree and completion rates and whether the program followed the progress of its students after completion. The faculty questionnaire, which was sent to all faculty identified as doctoral faculty by their institutions, collected data on funding, work history, and publications, as well as on demographic characteristics. One section of the questionnaire asked the respondent to rate the relative importance of program, faculty productivity, and demographic characteristics in assessing program quality, and then to rate the relative importance of components within these larger categories. The student questionnaire asked about student educational background and demographic characteristics, as well as research experiences in the program, scholarly productivity, career objectives, and satisfaction with a variety of aspects of the program.

This report also includes illustrations of how the dataset can be used to produce rankings of doctoral programs, based on the importance of individual measures to various users. Two of the approaches provided in the report are intended to be illustrative of constructing data-based ranges of rankings that reflect values to assess program quality determined by the faculty who teach in these programs. Other ranges of rankings can also be produced reflecting the values of the users. The production of rankings from measures of quantitative data turned out to be more complicated and to have greater uncertainty than originally thought. As a consequence, the illustrative rankings are neither endorsed nor recommended by the National Research Council (NRC) as an authoritative conclusion about the relative quality of doctoral programs. Nevertheless, the undertaking did produce important insights that are useful as stakeholders use the dataset and the illustrations to draw conclusions for their own purposes. The illustrative approaches illuminate the interplay between program characteristics and the weights based on values of users that go into constructing rankings. The ranges of rankings that are shown convey some, but not all, of the uncertainties that can be estimated in producing rankings based on assigning importance weights to quantitative measures.

The reader who seeks a single, authoritative declaration of the “best programs” in given fields will not find it in this report. The reason for this outcome is that no single such ranking can be produced in an unambiguous and rigorous way. To create illustrative rankings, the committee explored several approaches to evaluate and rate programs, with the subsequent rankings reflecting an ordered list of ratings from high to low. Program ratings depend on two things, namely the characteristics of the program (e.g., number of faculty, number of publications, citations, and other quantifiable measures) and the weighting, or value, that faculty assigned to each characteristic. The committee determined the weights to apply to important characteristics by two different methods based on faculty inputs. One method involved asking direct questions about what characteristics are important and how they should be weighed, while the second used an implicit method to determine the weights based on evaluations of programs by faculty raters. The results of these two approaches are different, and are presented separately in the report.

The committee also developed three other rankings based on separate dimensions of the doctoral programs. All five approaches, which are explained in more detail in the following paragraphs, have strengths and deficiencies. The committee is not endorsing any one approach or any one measure or combination of measures as best. Rather, the user is asked to consider the reason a ranking is needed and what measures would be important to that ranking. The different measures should then be examined by the user and given appropriate weights, and the user should choose an approach that weights most heavily what is important for that user’s purpose. As the committee has stressed repeatedly, the user may take the data that the study provides and construct a set of rankings based on the values that the specific user places on the measures.

The faculty survey on the relative importance of various measures yielded weights that are used to develop one illustrative ranking, the S-ranking (for survey-based), for which we present ranges of rankings for each program. On a separate questionnaire, smaller groups of randomly selected faculty in each field were asked to rate programs from a sample of doctoral programs. The results of the regression of these ratings on the measures of program characteristics are used to develop another range of illustrative rankings, the R-rankings (for regression based). The ranges and weights for these two ways of calculating rankings—one direct (S-ranking) and one indirect (R ranking)—are reported separately and provided in an online spreadsheet ( that includes a guide for the user.

The ranking methodology utilized by the committee in these illustrative approaches has been chosen to be based on faculty values. This decision was made because perceived quality of the graduate doctoral program in a field is typically based on the knowledge and views of scholars in that field. Dimensional measures in three areas—research activity, student support and outcomes, and diversity of the academic environment—are also provided to give additional illustrative ranges of rankings of separate aspects of doctoral programs.

An earlier version of the methodology is described in the Methodology Guide. 1 The primary change made since the Guide was prepared was the decision to provide separate R and S rankings as illustrations rather than combining them into one overall ranking. This methodology is now described in technical terms in Appendix J. Although the relative importance of measures varies in different fields, per capita measures for publications, citations, grants, and awards are strongly preferred by faculty as key measures of program quality. One interesting and important difference between the weights that result in the R and S rankings is that the one measure of program size — the average number of Ph.D.’s granted over the previous five years—is often the largest weight in the R rankings and relatively small in the S rankings. Faculty appear not to assign as much importance to program size when assigning weights directly compared to the importance of program size in weights assigned indirectly based on their rating of programs. Program size, while not likely to be a direct cause of higher program quality, may serve as a surrogate for other program features that do exert positive influences on perceived quality.

The illustrative ranges of rankings are instructive for several reasons. Most importantly, they allow for comparison of programs in a field in a way that recognizes some—but not all—of the uncertainties and variability inherent in any ranking process. This uncertainty and variability come partially from variability in rater opinions, variability in data from year to year, and the error that accompanies the estimation of any statistical model. The ranges that are provided cover a broad interval of 90 percent, which is another change from the original methodology report. There are other sources of uncertainty that are not captured in the ranges presented in the illustrative rankings. These additional sources include uncertainty in the model for assessing quality based on quantitative measures as well as the uncertainty that the 20 measures capture the most relevant factors that are needed to assess quality in a particular field.

The current approach does have the advantage of collecting exactly the same categories of data from all programs being assessed, and uses those data to calculate ratings based on the relative importance of measures as established by doctoral faculty. This approach, however, entails a key vulnerability. In the current methodology, when program measures in a field are similar, program differences in the range of rankings can depend strongly on the precise values of the input data, and so are quite sensitive to errors in those data. We have worked to assure the quality of the measures used to generate rankings and have tried to minimize such errors in the data collection. But errors can arise from clerical mistakes and possible misfit between the measures and the data. They can be caused by misunderstandings by our respondents concerning the nature of the data requested from them, or they may be embedded in the public data-bases that we have used. Some of the key publication sources in a field or subfield may not be included in the public database that was used.2 Despite our efforts we are certain that mistakes, misunderstandings and errors in input data remain, and these will propagate through to any rankings.

We believe, however, that careful error-checking both by the NRC and by the doctoral programs being assessed has produced a collection of data of great usefulness in understanding doctoral education, both by providing a means for users to assess the quality of doctoral programs and for what the detailed analyses of the data themselves can tell us. The data permit comparisons of programs on the basis of several program characteristics, each of which provides an important vantage point on doctoral education. The ranges of illustrative rankings, because of the values expressed in the faculty questionnaires, emphasize measures of faculty productivity. But the data enable comparisons using any of the categories in which data were collected. Doctoral programs can readily be compared, not only on measures of research activity in a program, but, for example, on measures of student support, degree completion, time to degree, and diversity of both students and faculty. These data will become even more valuable if they are updated periodically and made current, which the committee strongly encourages.3

The work that has gone into producing this assessment of doctoral programs has raised the level of data collection vital to understanding the broad dimensions of doctoral education in the United States. It would be difficult to overstate the efforts required of universities and doctoral programs to produce, check, and recheck the data collected for this assessment. The extensive reliance on data in this assessment called for the collection of an enormous amount of information that has not been routinely or systematically collected by doctoral programs in the past. Graduate schools, institutional researchers, program administrators, and individual doctoral faculty all contributed countless hours to compiling, clarifying, and substantiating the information on which this assessment is based. As a result, we believe that this focus on data collection in and of itself by participating universities and their programs has created new standards, and improved practices, for recording quantitatively information on which qualitative assessments of doctoral programs can be based.

With the abundance of data included in this assessment comes a great deal of freedom in determining which information is most useful to individual users. We are particularly hopeful that the wealth of data collected for this assessment will encourage potential applicants to doctoral programs to decide what characteristics are important to them and will enable them to compare programs with respect to those characteristics. Potential doctoral applicants, and, indeed, all users of this assessment, are invited to create customized assessment tables that reflect their own preferences.4


Changes between 1993 and 2006

Because the biological science fields have been extensively reorganized since 1993, when the last NRC assessment was carried out, it is difficult to make comparisons in these areas over time. Other programs that were not included in 1993 are included in this assessment, including many programs in the field of agricultural sciences.

For fields in engineering, physical sciences, humanities, and social sciences, where comparisons between the previous study and this one are possible, we find that:

  • Since the last NRC study was published in 1995 (based on data collected in 1993), the numbers of students enrolled in the programs that participated in both studies have increased in some broad fields (in engineering by 4 percentage points, and in the physical sciences by 9 percentage points) and declined in others (down 5 percentage points in the social sciences and down 12 percentage points in the humanities).5
  • The numbers of Ph.D.’s produced per program across these common programs has grown by 11 percent.
  • All the common programs have experienced a growth in the percentage of female students with the smallest growth (3.4 percentage points) in the humanities fields, which were already heavily female, and the greatest growth in the engineering fields (9 percentage points, increasing to 22 percent overall).
  • For all doctoral programs in fields covered by the study, there has been an increase in the percentage of Ph.D.’s from underrepresented minority groups6 (a growth of 2.3 percentage points to 9.6 percent in the agricultural sciences, 3.7 percentage points to 9.8 percent in the biological sciences, 1.7 percentage points to 6.4 percent in the physical sciences, 5.2 percentage points to 10.1 percent in engineering, 5.0 percentage points to 14.4 percent in the social sciences and 3.5 percentage points to 10.9 percent in the humanities).7
  • Because of differences between the definition of faculty in 1993 and 2006, we cannot strictly compare faculty sizes, but it appears that the number of faculty involved in doctoral education has also grown in most programs.

Users are warned that, because of fundamental changes in the methodology, comparisons between 1993 rankings and ranges of rankings from the current study may be misleading. They are encouraged to understand the derivation of the current ranges of rankings and to examine the weights and variable values that led to them.

Program Characteristics


We found that doctoral education in the United States is dominated by programs in public universities in terms of numbers of doctorates produced. Seventy-two percent of the doctoral programs in the study are in public universities. Of the 37 universities that produced the most Ph.D.’s from 2002–2006 (making up 50 percent of the total Ph.D.’s granted during this time), only 12 were private universities. The health of research and doctoral education in the United States depends strongly on the health of public education.


As was found in the 1982 and 1995 reports, program size continues to be positively related to program ranking. This result holds despite our reliance in the current study on per capita measures of scholarly productivity. In most broad fields, the programs with the largest number of Ph.D.’s publish more per faculty member, have more citations per publication, and receive more awards per faculty member than the average program.


There is very little difference among fields in the percent of students who receive full support in their first year. For all fields, this percentage is somewhere between 80 percent (social and behavioral sciences) and 92 percent (physical sciences). The larger programs have significantly longer median times to degree in all fields except the biological sciences, and this is particularly true in the humanities (7.4 years as compared to 6.1 years for the broad field as a whole). There is no significant difference based on size in the percentage of students who have definite plans for an academic position upon graduation. There are, however, differences by field, ranging from a high of 46 percent for the humanities, to a low of 15 percent for engineering. In terms of completion, over 50 percent of students complete in six years or less in the agricultural sciences and in engineering, but a smaller percentage does so in the other broad fields. In the social sciences the percentage is 37 percent, which is the same percentage completion for the humanities after eight years. In the physical sciences, the six-year completion percentage is 44 percent.


The faculty of doctoral programs is not diverse with respect to underrepresented minorities —5 percent or less in all broad fields except the social sciences (7 percent) and the humanities (11 percent). Student diversity is greater—10 percent or above in programs in all broad fields except the physical sciences (8 percent). The faculty is more diverse in terms of gender, with women making up over 30 percent of the doctoral faculty in the biological sciences (32 percent), social sciences (32 percent), and humanities (39 percent). Engineering (11 percent) and the physical sciences (14 percent) lag with the agricultural sciences falling in between (24 percent). Women make up nearly 50 percent or more of students in the agricultural, biological and social sciences and the humanities. Again, the physical sciences (14 percent) and engineering (11 percent) lag despite a decade of growth in the production of female Ph.D.’s. International students are well over 40 percent of students in the agricultural sciences (42 percent), the physical sciences (44 percent), and engineering (58 percent), and less in the other broad fields.

Faculty Characteristics

Over 87,000 faculty involved in doctoral education answered our faculty questionnaire, and the committee focused its attention on obtaining the weights that showed what faculty thought mattered to program quality and that were then used in the rankings. We found that the majority of faculty are middle-aged (between the ages of 40 and 60), and over 70 percent have been at their current university for 8 years or more. The effect of pervasive postdoctoral study is apparent in the biological and agricultural sciences, where only 6 percent of the faculty in doctoral programs are under the age of 40 as compared to more than double that percentage in the social and physical sciences and engineering. In the humanities 9 percent of the faculty are under the age of 40, but the humanities also have the highest percentage (27 percent) over the age of 60.

Student Characteristics

Questionnaires were sent to advanced8 doctoral students in programs in five fields—chemical engineering, physics, neuroscience, economics, and English. Sixty-four percent of these programs had more than 10 students responding, which was the cutoff used by the committee for reporting the results for individual programs. These reportable programs made up 85 percent to 90 percent of the programs surveyed. In total, complete questionnaires were received from 70 percent of the students who had been asked to respond. This database should prove to be of great interest to researchers on doctoral education.9

Generally speaking, a majority of students were “very satisfied” or “somewhat satisfied” with the quality of their program in all fields. English stood out as a field where fewer than 40 percent of the students reported that their research facilities and their workspaces were “excellent” or “good,” which may reflect a difference among fields over what constitutes quality research facilities and work spaces. Only 40 percent or less of students in all the fields were satisfied with the program-sponsored social interaction. Over 60 percent in most fields, however, felt they benefited from the program’s intellectual environment. Programs do well in supporting students to attend professional and scholarly meetings and, in the science and engineering fields, over 35 percent have published articles in refereed journals while still enrolled in their doctoral program.

Students were also asked about their career objectives recalled from when they entered the program and when they answered the questionnaire. There was a decline in the percentage who said they had “research and development” as a career objective in all fields and a decline in those interested in teaching in all fields but neuroscience. The percent of students who had management and administration as a career objective grew, but was still below 10 percent in all fields. Research and development was still the predominant career goal, except in English, where teaching (52 percent) dominated.

In summary, doctoral education in the United States is a vast undertaking comprising many programs in many fields with, overall, very high standards and intellectual reputation. For a long time, North American institutions of higher education have been the world’s standard for the research doctorate. As universities across the globe compete with increasing intensity for the faculty and students who will advance the knowledge economy of the future, it is important that we take stock of the enormous value represented by the United States research doctorate programs. Taken together, these programs will produce the future thinkers and researchers for all kinds of employment as well as the faculty who nurture the next generation of scholars and the researchers. All are essential to scientific discovery, technological innovation, and cultural understanding in the United States and across the globe.

This study cannot, of course, provide a comprehensive understanding of these research doctorate programs. The data collected for this study represent an unprecedented wealth of information, and the committee hopes that they will be updated and used for further analysis.10 These data have been used to produce illustrative ranges of rankings of research doctorate programs aimed at reflecting the values of the faculty who teach in these programs. The intent is to illustrate how individuals can use the data and apply their own values to the quantitative measures to obtain rankings suitable for their specific purposes. But the data themselves, even more so than the weighted summary measures and the illustrative ranges of rankings, can lead to analyses that throw revealing light on the state of doctoral education in the United States, can help university faculty and administrators to improve their programs, and can help students to find the most appropriate graduate programs to meet their needs.



National Research Council. A Revised Guide to the Methodology of the Data-Based Assessment of Research-Doctorate Programs in the United States (2010). Washington, D.C.: National Academies Press, 2010, which has been incorporated as Appendix J in this volume.


For example, for the field of computer science, refereed conference papers are an important form of scholarship. For the humanities fields, books are important. Publications for all these fields were compiled directly from faculty résumés.


Recommendation 4 of the 2003 Methodology Study was “Data for quantitative measures should be collected regularly and made accessible in a Web-readable format. These measures should be reported whenever significantly updated are available.” (p. 3 and 63) and the study also says “More frequent updating of these data would provide more timely and objective assessments.” (p. 64)


The NRC is supplying the data from this study to www​, which will allow users to construct their own rankings with their own weights.


The current study contains six broad fields: agricultural sciences, biological and health sciences, physical sciences, engineering, social and behavioral science, and humanities. This aggregation of fields is a convenient way to summarize data for the 62 individual fields.


These are based on data reported by the National Science Foundation, because the 1995 NRC Study did not collect data on minority Ph.D.’s.


By underrepresented minorities we mean: African-Americans, Hispanics, and American Indians.


By advanced, we mean that they have been admitted to candidacy.


Because of confidentiality concerns, these and the faculty data will not be publicly available, but will be made available to researchers who sign a confidentiality agreement with the NRC.


Some of these analyses will be reported as part of this study at a workshop to be held after the data are released.

Copyright © 2011, National Academy of Sciences.
Bookshelf ID: NBK83389


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (4.5M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...