NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Academy of Sciences (US), National Academy of Engineering (US), and Institute of Medicine (US) Committee on Science, Engineering, and Public Policy. Implementing the Government Performance and Results Act for Research: A Status Report. Washington (DC): National Academies Press (US); 2001.

Cover of Implementing the Government Performance and Results Act for Research

Implementing the Government Performance and Results Act for Research: A Status Report.

Show details

CSummaries of Agency Focus Group Presentations

The following summaries are based on five focus groups held with the major research-supporting agencies during the fall of 2000 and a workshop hosted by the National Academies on December 18-19, 2000. Each focus group was attended by panel members (three of whom were also members of COSEPUP), by agency representatives who were senior research administrators responsible for GPRA compliance, and by representatives of oversight bodies (Congress, OMB, and GAO) responsible for review of GPRA performance plans and reports from research agencies.

A similar agenda was followed during each focus group. The panel began by explaining its goals, agency representatives described their research programs and their mechanisms for GPRA compliance, panel members and oversight representatives commented on agency methodology, and all participants concluded by offering summary comments. The goal of each discussion was to identify aspects of the methodology that could become “best practices” for use by other agencies and areas where the methodology could be improved.

After each focus group, a summary was produced that used the comments and written materials to answer the following questions:

  • What methodology is used for evaluating research programs under GPRA?
  • What level of unit is the focus of the evaluation?
  • Who does the evaluation of the research program under GPRA?
  • What criteria are used for the evaluation?
  • How do the selection and evaluation of projects relate to the evaluation of the research program?
  • How is the result communicated to different audiences (e.g., the S&T community, advisory committees, agency leadership, the administration, Congress)?
  • How is the result used in internal and external decision-making?

The workshop was structured differently for a much larger group. The first day's discussion was open to the public and attended by nearly 30 participants. The agenda included a general discussion, an overview, general comments from stakeholders and agencies, breakout sessions, a second general discussion focusing on conclusions and recommendations, panel member comments on the draft report, and a summary session. The second day of the workshop was reserved for panel members, who developed conclusions and recommendations for the report.

Appendix C-1. SUMMARY OF THE DEPARTMENT OF DEFENSE FOCUS GROUP

1. What methodology is used for evaluating research programs under GPRA?

1.1. Overview.

The Department of Defense (DOD) response to the Government Performance and Results Act (GPRA) is discussed in Appendix I of its Annual Report to the President and the Congress (2000). This appendix summarizes the DOD strategic plan and the ways in which the department links this plan to performance goals and evaluates the performance goals annually.

Virtually all DOD's science and technology (S&T) activities fall under “Performance Goal 2.2 – Transform US Military Forces for the Future.” This transforming process is said to be achieved through the development of “new generations of defense technologies.” The strategy for achieving these new technologies involves three elements: the basic research plan (BRP), the Joint Warfighting Science and Technology Plan (JWSTP), and the Defense Technology Areas Plan (DTAP).

1.1.1. Basic research.

Before World War II, the federal government spent most of its research dollars in federal laboratories. There was considerable opposition to the government's involvement in universities. This was muted by the arguments of Vannevar Bush, who established the conceptual framework for contractual and “unfettered” basic research. Today, DOD invests 56% of its basic research dollars in universities; 29% goes to government laboratories and 11% to industry. Bush argued that such investments in basic research are acts of both faith and patience, but the investments are justified many times over by returns of great value.

DOD's BRP is described as “the cutting edge of the Defense Science and Technology Program.” This plan is realized primarily by directly funding research in universities, federal laboratories, and industry and by keeping “a watchful eye on research activities all over the world to prevent technological surprise.” The BRP contains an overview of the entire DOD research program, most of which can be described in 12 disciplinary categories.C1-1 Interdisciplinary research is specifically addressed under three programs. In addition, the plan covers education, training, and instrumentation.

DOD supplies only about 6% of the nation's total federal funding for basic research,C1-2 but this effort is focused in a number of fields that are critical to the nation's scientific performance. Universities receive 22% of their basic research funding for mathematics from DOD, 42% for computer science, 71% for electrical engineering, 63% for mechanical engineering, and substantial amounts in optics, materials, and oceanography.

1.1.2. Applied research and advanced technology development.

The BRP is coupled with two complementary plans that focus on applied research and advanced technology development: the Joint Warfighting Science and Technology Plan (JWSTP) and the Defense Technology Area Plan (DTAP).

The JWSTP takes a joint perspective horizontally across the applied research (6.2) and advanced technology development (6.3) investments to ensure that needed technology and advanced concepts are supported.

The DTAP presents the DOD objectives and the 6.2-6.3 investment strategy for technologies critical to DOD acquisition plans, service warfighter capabilities, and the JWSTP. It also takes a horizontal perspective across the Defense agencies to chart the total DOD investment for given technologies.

1.1.3. DTOs.

DOD uses defense technology objectives (DTOs) to provide focus for the development of technologies that address identified military needs across the department. Each DTO identifies a specific technology advancement that will be developed or demonstrated, with expected date of availability, specific benefits resulting from it, and the amount of funding needed. The DTO process is used to comply with GPRA. The output of this process includes budget and management decisions.

1.1.4. TARA.

The methodology used for evaluating S&T programs is known as technology area reviews and assessments (TARA). TARA is the department's official response to GPRA, and it is a mechanism to evaluate science and technology programs through expert peer reviews. But in this process, basic research is not isolated from applied research and advanced technology development. All three categories—6.1 (basic research), 6.2 (applied research), and 6.3 (advanced development)—are evaluated as overlapping parts of the technology area being reviewed. For example, biomedical research and chemical-biologic warfare research both have basic-research funding that is particular to them, but they are evaluated in their totality with clear links to what discoveries are expected.

1.1.5. Reliance.

The department also uses a process called reliance to guide corporate planning and assessment. Reliance members include the deputy under secretary of defense (science and technology), representatives of all the services, and defense agencies. The objective of reliance is to coordinate the S&T process, stimulate communication among the different services and other groups, and clarify priorities. This is the vehicle for planning and overview that brings the services together. Reliance is designed to encourage collaboration and communication and prevent unnecessary duplication. The group reviews the DTOs themselves, and at the end of the review process all participants sign off on the results of their discussions, so they all have a stake in it.

1.2. What level of unit is the focus of the evaluation?

DOD evaluates its S&T activities by reviewing performance at the level of DTOs. There are approximately 400 DTOs, each of which identifies a specific anticipated technology advance, the date of availability, benefits, technologic barriers, and customer. The DTOs are supported by virtually all the S&T defense agencies and services.

1.2.1. The evaluation process.

The objectives of the DTAP include creation of technologies that enhance the nation's future warfighting capability. Performance under DTAP can be evaluated by the TARA. TARAs are held every two years for a particular technology area. This year, evaluations are being done in biomedical, battlespace environments, ground/sea vehicles, materials and processes, space platforms, chemical and biological defense, and sensors, electronics and electronics warfare. TARA reviews all three levels of S&T investment—6.1, 6.2, and 6.3.

The TARA reviews are conducted over a period of one week. A review team is asked to evaluate progress toward the individual objectives of DTOs and tries to determine whether that progress should be given a grade of green, yellow, or red.C1-3 The team is also asked whether a certain area—say, “Detection”—is addressing most of the technologic issues that need to be addressed. Is the research portfolio appropriate for the objective? If part of the program took a serious funding reduction, was the reduction justified? The TARA teams evaluate the programs for quality, for advances in state-of-the-art research areas, and for their scientific vision. Last year, 96% of the department's DTOs were given the grade of green.

1.2.2. Examples of evaluation by DTOs.

The following two examples from the 2000 Chemical and Biological Defense Science and Technology TARA illustrate how evaluation works by using the DTOs as the unit of focus. For example, the TARA process gave the “Modeling and Simulation” DTO a yellow grade because of management problems. Because virtually all other DTOs were awarded greens, this was deemed serious enough to trigger a management reorganization. The DTO on “Force Medical Protection”: got a red grade because the TARA panel determined that poor technical assumptions and decisions had been made and that researchers were investigating a technology that was not appropriate for the desired objective. As a result, the defense organization performing the work has altered the technical approach to the objectives.

Sometimes, such questions are referred for more basic research before major changes are made. In a final example of DTAP DTOs, “Laser Standoff Chemical Detection Technology” received a yellow grade because reviewers decided that the project might, given current performance, have problems after 3 or 4 years. The basis for this judgment was that the project's objective was written in a way that didn't match well with what the researchers were actually doing.

1.2.3. A rationale for “holistic” evaluations.

This process of evaluating performance by DTOs was established before the passage of GPRA, and the management and reporting chains have remained the same. The 6.1, 6.2, and 6.3 aspects of the DTO are all looked at by the same reviewing panel. Panels do not look at the 6.1 element independently, because it is assumed that basic research has important feedback loops with both the applied research and advanced technology development stages.C1-4

As an example, DOD is seeking a vaccine for the Ebola virus, and until the basis for such a vaccine is discovered, the research will be funded under the 6.1 category. If a potential vaccine construct is discovered, the vaccine will move to application and development stages, where it will be funded under 6.2 and 6.3 categories. As application and development proceed, further work with 6.1 funds might be needed to achieve a more complete basic understanding and more effective application. Under this same holistic approach, the “Laser Standoff” will be funded under 6.1; if the discovery proves out and can be applied and developed, the program will be moved to 6.2-6.3 phases.

1.3. Who does the evaluation of the research program under GPRA?

The evaluation of basic and applied research is carried out by both internal agency panels of experts and by TARA review panels. Each panel consists of 10-12 technical experts from academe, industry, and nonprofit research organizations. Most TARA team members are recognized experts from the National Academies, the Defense Science Board, the scientific advisory boards of the military departments, industry, and academe. Each is chaired by a senior executive appointed by the deputy under secretary for S&T.

These teams are asked to evaluate the programs for quality, for advances in leading the state of the art in research areas, and for their scientific vision. The department requires that two-thirds of each panel be experts from outside DOD. One-third of each panel's members are “refreshed” at the time of each reviewing cycle. Most areas have a 2-year reviewing cycle; chemical-biologic defense is reviewed annually per DOD's implementation of P.L. 103-160.

At a higher level, evaluation is overseen by the Defense Science and Technology Advisory Group (DSTAG), which advises the deputy under secretary for S&T. DSTAG is a key decision-making body consisting of representatives of each service and defense agency. DSTAG provides oversight of an integrated S&T strategic planning process and effectively maintains responsibility for the entire S&T program. It oversees the work of the Basic Research Panel, which consists of eight people and must approve the BRP; the 12 technology panels responsible for preparation of the DTAP; and the 13 panels responsible for preparation of the JWSTP. These plans build on but do not duplicate the service-agency S&T plans.

1.4. What criteria are used for the evaluation?

In the broadest sense, all research activities—like any other DOD activities—must be justified under the mission goals of the agency. If a research project cannot demonstrate its mission relevance, it probably will not be funded.C1-5

1.4.1. Evaluating performance.

Most specifically, the department evaluates success in achieving the performance goals on two levels. At a lower level of aggregation, individual performance measures and indicators are scored at the end of each fiscal year to determine how performance compared with numeric targets set when the budget was submitted.

At a higher level, annual performance goals are evaluated in two ways. First, results for each of the subordinate measures and indicators are evaluated within the context of overall program performance. Second, a determination is made as to whether a shortfall in expected performance for any metric or set of supporting metrics will put achievement of the associated corporate goal at risk. This subjective determination is trend-based and cumulative. A single year of poor performance might not signal that a corporate goal is at risk, although several years of unsatisfactory performance almost certainly will.

1.4.2. Evaluating basic research.

At finer levels—for basic research that is expected to lead to new technologies—the department finds that evaluation through the use of metrics is difficult or impossible. There is no reliable way to measure the success of basic research in the near term, because its outcomes are by definition unpredictable. There might be no payoff this year, or next year—until suddenly researchers see a new “data point” that can give rise to a whole new industry.

For this reason, the department chooses to demonstrate the value—and outcomes—of basic research through retrospective achievements. The rationale for this is that the most valuable technologies for defense applications have derived from basic research done years or even decades before the first application. Therefore, the causative process can be more clearly illustrated by looking backward than by conjecturing about future results.

According to the BRP, “a retrospective approach is a reminder that many of the technologies we now take for granted were brought about by investing much earlier in basic research.” The following examples all resulted largely from timely DOD investments in basic research:

  • Owning the Night (night vision technology).
  • Precision Guidance for Air Defense Missiles.
  • The Airborne Laser.
  • The Kalman Filter (more accurate data for navigation, guidance, and tracking).
  • The Global Positioning System.

Retrospective studies are intended to build support for the process, not for individual projects. It is not possible to point to the outcome of an on-going individual project.

1.4.3. Education and training.

Other criteria used to evaluate programs include education and training. Clearly, human resources are essential to the future strength of DOD. The department funds more than 9,000 graduate fellowships per year, two-thirds as many as the National Science Foundation (NSF).

However, a difficulty emerges in the way the DOD divides expenditures into the three categories called “Today's Force,” “Next Force,” and “Force After Next.” Most of the department's funds go to readiness (“Today's Force”); the next-highest priority is modernization (“Next Force”); “Force After Next,” which contains most S&T and education expenditures, receives a very small percentage of FY2000 appropriations for the department. This difficulty can be seen in the current GPRA format for evaluating S&T. One aspect of the problem is that manpower is considered to be “hard-wired” into the budget process, but there is no evaluation of the educational component itself and thus no incentive structure for good teaching, research training, or mentoring. For example, the substantial cuts in the 6.1 budget from 1993 to 1998 brought reductions in the number of the graduate students who could be supported by research grants at universities, but the GPRA process did not report this result. This is especially troubling for such fields as electrical engineering, computer science, and mathematics, where DOD plays a dominant national role in funding and where basic research is needed to maintain the country's leadership in information technology and other emerging fields.

1.4.4. Relevance to mission.

For 6.2 and 6.3 research, R&D activities are clearly aligned with DOD objectives through the DTO categories. For basic (6.1) research, the TARA process does not deal explicitly with how the research is relevant to the DOD mission, but relevance is examined at many points. DOD people attend all TARA reviews, and TARA does review the focus, as well as the quality of the BRP.C1-6

In addition, relevance is addressed in the internal management processes. The biennial basic-research cycle starts with project-level reviews at the individual research agencies (Army Research Office, Office of Naval Research, and Air Force Office of Scientific Research). These sessions are followed by program-level reviews of the combined research agencies and by preparation of the BRP. The BRP is evaluated by the director of defense research and engineering, with feedback to the agencies after the annual program review. The services and defense agencies also conduct other periodic program reviews to assess quality, relevance, and scientific progress.

1.4.5. Other criteria.

Issues of intellectual property, patents, and spin-offs are also considered to be valuable indicators of the quality and relevance of DOD S&T research. Arranging intellectual property ownership occasionally proves difficult, however, and leads to disputes. These disputes can impede collaboration, most often in university-industry partnerships.

1.5. How does the selection and evaluation of projects relate to the evaluation of the research program?

The selection and evaluation of S&T projects, like all DOD activities, are highly mission-oriented. Projects must have clear objectives, and they must deal with products and product development. The users of products are either in house or in some other agency, so they and their mission are well known as well.

S&T research programs are evaluated in the context of the projects to which they contribute. For example, the DTO of “Detection/Strengths” is analyzed by expert peers. Those conducting the review try to capture their expert opinion, and the opinion is supposed to be based only on the information provided by DOD about the program. In-depth technical reviews are done at the manager level of the project. The Army has its own complex way of evaluating research programs at the service level.

The department has struggled to develop the best way to evaluate research in light of GPRA. When Congress directed the reorganization of DOD in 1987, it put civilian heads in key positions to assess each service and suggest modifications. It was their task to convince military leaders of the value of S&T to the department. The civilian leaders came up with the technique of using planning documents and planning by objective, and an S&T master plan was created in 1990. The introduction of GPRA in 1993 brought a new challenge, and once again the civilian leaders had to make the case for the importance of S&T to the defense mission. Thus evolved the BRP and the two documents that relate it to war-fighting objectives.

2. How is the result communicated to different audiences (e.g., S&T community, advisory committees, agency leadership, Administration, Congress)?

The results of TARA reviews are communicated to agency leadership by “TARA outbriefings” for each technology area (6.2 and 6.3) and for basic research (6.1). This provides an efficient way to respond to queries, doubts, or challenges about the value of S&T.

Because GPRA is outcome-oriented and because the TARA mechanism is the department's best process for measuring outcomes, TARA is the best way to communicate the value of what is done by scientists and engineers. More broadly, the department finds TARA so effective as a means of evaluation and communication that it would keep it regardless of GPRA. Within the department, planners struggle through difficult debates and fundamental conflicts about the value of different research programs. The TARA process provides reference points and a means to refer back and forth to areas of planning.

The department also communicates the value of S&T to Congress in several ways, including the use of historical vignettes that demonstrate the utility of basic research. These vignettes are not in the GPRA document at present. In addition, the department communicates with oversight agencies, such as the Office of Management and Budget and the General Accounting Office, about the process of complying with GPRA and the results.

The department also communicates the results of its evaluations to other audiences. For example, nanotechnology research is closely coupled with research supported by NSF, and the National Institute of Standards and Technology. There is strong communication among interagency teams working on national security issues.

Overall, GPRA has the potential to provide a common language to address complex issues and to talk to stakeholders. At present, the various agencies are searching for a common approach that is not yet fully developed.

3. How is the result used in internal and external decision-making?

The TARA review process is used at all levels of decision-making. For example, the TARA 2000 review of chemical-biologic defense S&T program (against such agents as mustard gas, nerve gas, anthrax, plague, and smallpox) revealed that the program was not adequately represented by a DTO portfolio. It also revealed capacity limitations in laboratory infrastructure. In addition, the workforce was observed to be aging. These results will all influence decision-making at the planning and review stages.

TARA panels are able to find redundancies in programs. For example, a panel reviewing a plan for a particular kind of DNA research could point out that this work is already being done by people in industry and recommend that DOD put its dollars elsewhere.

Such strategic decision-making occurs at many levels. The reliance group does a conscious search to ensure that the same work is not being done in two places. For medical projects, the Nationwide Interagency Group helps to prevent duplication by tracking major programs. Learning technology is monitored for duplication by a variety of interagency groups. Basic research in cognitive science is overseen by a combination of NSF and the Office of Science and Technology Policy in the White House.

Other important decision-making that is not addressed by GPRA concerns the choice of basic research fields and the transition of a 6.1 program to a 6.2 and 6.3 program. For example, if a basic research program is shifting its emphasis from high-temperature superconductivity to nanotechnology, researchers know that what they have learned in the former field will benefit their work in the latter. But there is no way to quantify the value of this crossover effect. That kind of flexibility is crucial in basic research as the department seeks to free up money for projects that have the greatest potential to contribute to the defense mission, but it does not appear in the form of metrics. Similarly, quantifying a decision to move a research project from 6.1 to 6.2 is difficult. Such transitions are similarly essential to the mission, and it would be useful to be able to quantify the process. At the same time, the arbitrary application of metrics should be avoided when there is a risk of terminating a potentially useful line of inquiry.

DOD Focus Group Participant List October 4, 2000

Panel Members:

Enriqueta Bond

President

The Burroughs Wellcome Fund

Research Triangle Park, North Carolina

Alan Schriesheim

Director Emeritus

Argonne National Laboratory

Argonne, Illinois

John Halver

Professor Emeritus in Nutrition

School of Fisheries at University of Washington

Seattle, Washington

Morris Tanenbaum

Retired Vice Chairman and

Chief Financial Officer, AT&T

Short Hills, New Jersey

Robert M. White

University Professor and Director

Data Storage Systems Center

Carnegie Mellon University

Pittsburgh, Pennsylvania

Participants:

James Baker

Internal Medicine

University of Michigan

Ann Arbor, Michigan

Greg Henry

Program Examiner, National Security Division

Office of Management and Budget

Washington, D.C.

Genevieve Knezo

Congressional Research Service

Library of Congress

Washington, D.C.

Steven Kornguth

University of Texas at Austin

Institute for Advanced Technology

Austin, Texas

David Lockwood

Congressional Research Service

Library of Congress

Washington, D.C.

Elizabeth Mead

Senior Analyst

US General Accounting Office

Washington, D.C.

Jack Moteff

Congressional Research Service

Library of Congress

Washington, D.C.

Robin Nazzarro

Assistant Director

US General Accounting Office

Washington, D.C.

Michael Sailor

Department of Chemistry

University of California, San Diego

La Jolla, California

David Trinkle

Program Examiner, Science and Space Programs Branch

Office of Management and Budget

Washington, D.C.

Agency Representatives:

Robert Foster

Director, Biosystems

US Department of Defense

Roslyn, Virginia

Joanne Spriggs

Deputy Under Secretary of Defense (S&T)

US Department of Defense

Washington, D.C.

Robert Trew

Director of Research

US Department of Defense

Washington, D.C.

Robert Tuohy

Director, S&T Plans and Programs

US Department of Defense

Roslyn, Virginia

Leo Young

Office of Director for Research

US Department of Defense

Arlington, Virginia

Appendix C-2. SUMMARY OF THE NATIONAL INSTITUTES OF HEALTH FOCUS GROUP

1. What methodology is used for evaluating research programs under GPRA?

1.1. Overview.

The National Institutes of Health (NIH) is an agency within the Department of Health and Human Services (DHHS). NIH's mission is to uncover new knowledge and to develop new or improved methods for the prevention, detection, diagnosis, and treatment of disease and disability. Preparation of NIH's annual GPRA performance plans and performance reports is the responsibility of the Office of Science Policy within the Office of the Director, NIH. GPRA documents are formally submitted by NIH, through DHHS, in conjunction with the normal cycle of budget document submission throughout the year. In compliance with the requirements of GPRA, NIH has prepared and submitted Annual Performance Plans for Fiscal Years 1999, 2000, and 2001. The FY 2002 Performance Plan is now being developed.

Like other federal agencies that support scientific research, NIH faced challenges in evaluating the outcomes of its research programs in accordance with the requirements of GPRA. Compliance with GPRA required the NIH to implement an assessment and reporting process that complemented the ongoing mechanisms for review of research progress.

1.1.1. NIH GPRA Performance Plan.

For purposes of GPRA planning and assessment, the NIH has aggregated and categorized the mission-related activities of all its Institutes, Centers, and Offices into three core program areas: Research, Research Training and Career Development, and Research Facilities. For each of these three core program areas, the NIH has identified expected outcomes, major functional areas, specific performance goals, and annual targets within its GPRA performance plan. The performance goals in NIH's annual performance plans address both the long-term, intended results or outcomes of NIH core program activities and the management and administrative processes that facilitate the core program activities and lead to the achievement of outcomes. For example, within the Research Program, outcome goals include increased understanding of biological processes and behaviors, as well as the development of new or improved methods for the prevention, diagnosis, and treatment of disease and disability. NIH's Annual Performance Plans include performance goals that can be assessed through the use of objective/quantitative measures as well as performance goals that require descriptive performance criteria.

1.1.2. Quantitative measures.

Most of the 50-odd performance goals described in NIH's annual performance plans can be assessed through the use of objective and quantitative measures, such as numerical targets, data tracking and collection systems, completion of studies or actions, and program-evaluation studies.

For example, two of the seven primary research goals can be evaluated quantitatively. One is to develop critical genome resources, including the DNA sequences of the human genome and the genomes of important model organisms and disease-causing micro-organisms. An example of a quantitative goal for FY2001 was to complete the sequencing of one-third of the human genome to an accuracy of at least 99.99%. A second quantitative goal for FY2001 was to work toward the president's goal of developing an AIDS vaccine by 2007; progress is described in terms of the design and development of new or improved vaccine strategies and delivery or production technologies. For both goals, NIH was able to identify specific milestones or other measurable targets.

1.1.3. Qualitative criteria.

The annual performance goals related to its dominant research program, however, are more qualitative. NIH has concluded that strictly numeric goals and measures are neither feasible nor sufficient to capture the breadth and impact of research it performs and supports. In such cases, GPRA provides an avenue for an agency to define performance goals that rely on criteria that are more descriptive and to use an alternative form of assessment.

A small subset of the annual performance goals, related to the NIH Research Program, are more qualitative in nature, and the NIH has used the alternative form, as allowed by the GPRA, for these five goals:

  • Add to the body of knowledge about normal and abnormal biological functions.
  • Develop new or improved instruments and technologies for use in research and medicine.
  • Develop new or improved approaches for preventing or delaying the onset or progression of disease and disability.
  • Develop new or improved methods for diagnosing disease and disability.
  • Develop new or improved approaches for treating disease and disability.

For the five qualitative goals mentioned above, an independent assessment process has been developed and is described in more detail below.

1.2. What level of unit is the focus of the evaluation?

For purposes of strategic planning under GPRA, an agency is defined as a cabinet-level department or independent agency. In the case of NIH, this means that the parent agency, DHHS, is the agency that must develop a long-term strategic plan. NIH's core programs support the strategic plan of DHHS, and NIH provides considerable input into its development. Each of the NIH operating units has its own plan and reports in a formal way through the department. These units may be formed around a disease, a phase of the human life cycle, a biologic system, or a profession (such as nursing). The overall DHHS performance plan is the total of the plans of its 13 subagencies, which in turn stem from the DHHS strategic plan.

The primary dilemma of those in charge of the response to GPRA has been the size and complexity of NIH itself. For example, they found no meaningful way to evaluate the research results of individual institutes and centers, because the work of each unit overlaps with and complements the contributions of others. And each institute and center has its own strategic plans, devised independently.

1.2.1. Aggregation.

For a broader view of GPRA assessment, NIH has chosen the option of aggregating the activities of its institutes, centers, and offices (as permitted by GPRA) in the three core program areas described above.

1.3. Who does the evaluation of the research program under GPRA?

During NIH's planning for GPRA, there was considerable discussion about who should do the assessment. NIH officials decided that the evaluation should be conducted by an Assessment Working Group of the Advisory Committee of the Director (ACD), NIH, following the highly effective model of peer review that is used by NIH for merit review of grant applications. This Working Group drew its membership from members of the ACD, the Director's Council of Public Representatives, and members of NIH Institute and Center national advisory councils that provide advice on a broad range of topics. This combination of individuals provided broad representation of the scientific and medical communities, health care providers, patients, and other representatives of the public. Moreover, it provided the expertise and perspectives necessary for evaluating the scientific quality and societal relevance of the outcomes of the NIH Research Program. The final working group had 26 members: six ACD members, 16 COPR members, and four ad hoc scientists selected by the NIH director for their scientific expertise in areas not already represented.

1.4. What criteria are used for the evaluation?

1.4.1. Peer review.

One reporting challenge for NIH is that most of its funding for research does not stay within the system. Some 82% of the budget goes outside NIH in the form of extramural awards, compared with about 11% that pays for intramural research at the Bethesda, MD, campus and other centers. Each year, NIH receives some 40,000 research proposals from scientists and research centers throughout the country. Much of the extramural research is performed by principal investigators at universities who employ and train graduate students and postdoctoral scientists on their grants. These projects, which might have multiple funders, are not under the direct control of NIH. They are, however, governed by an effective and long-standing peer-review process with stringent requirements and evaluation procedures. NIH found no need to attempt to duplicate or replace this system, which is the traditional means of research approval and assessment used in scientific programs.

Therefore, to develop its approach to GPRA, NIH developed an independent assessment process for evaluating program outcomes and compared them with the performance goals for the research program. In the broadest terms, the assessment involved “gauging the extent to which NIH's stewardship of the medical research enterprise leads to important discoveries, knowledge, and techniques that are applied to the development of new diagnostics, treatments, and preventive measures to improve health and health-related quality of life” (from the NIH GPRA Research Program Outcomes for FY1999).

1.4.2. Assessment materials.

The working group was provided with narrative “assessment materials” that consisted of the following evidence of research-program outcomes:

  • Science advances. One-page articles prepared by NIH that describe a specific scientific discovery published within the last year and supported by NIH funding. Each advance was related to its impact on science, health, or the economy.C2-1
  • Science capsules. One-paragraph snapshots of the breadth and scope of individual NIH research program outcomes. Their brevity allows for a greater number of vignettes, each offering a thumbnail description of an advance and its significance, so that the overall picture created by the capsules is more nearly representative of the research effort as a whole.C2-2
  • Stories of discovery. One- to two-page narratives that trace a major development over several decades of research, demonstrating the long-term, incremental nature of basic research and its often-surprising utility in seemingly unrelated areas of medicine. These narratives address the difficulty of attempting to describe important advances in terms of a single finding or annual accomplishments.C2-3
  • Research awards and honors. Brief descriptions of national and international scientific awards or honors received by NIH scientists and grantees. The write-ups demonstrate how the external scientific community values the work of NIH grantees.

Narrative descriptions of research accomplishments were accompanied by citations of publications related to the accomplishments.

To assemble the narrative materials about outcomes, each NIH institute and center was asked to provide 10-20 science advances, 10-20 science capsules, and one or two stories of discovery. The resulting assessment materials were considered to provide an extensive illustration of NIH's FY1999 research outcomes that address the five qualitative research-program performance goals.

1.4.3. Evaluating outcomes.

A total of almost 600 advances, capsules, and stories of discovery were given to the working group 3 weeks before its 1-day assessment meeting. For the meeting, each member was asked to review a subset of the materials: those for goal A (“add to the body of knowledge...”), those for one additional goal (instruments and technologies, prevention, diagnosis, or treatment), and the research awards. Each was asked to identify, if possible, some five noteworthy scientific discoveries from each assigned goal and to identify any findings considered “marginal.”

At the 1-day meeting, the working group discussed in plenary session the research outcomes for goal A and discussed and assessed goals B through E (instruments and technologies, prevention, diagnosis, and treatment) in breakout groups.

After the meeting, the working group was asked to evaluate the outcomes for each goal. To assess goal A, the working group was asked to use the following criteria:

  • The NIH biomedical research enterprise has successfully met this goal when its research yeilds new findings related to biologic function and behaviour, and the new findings are publicized or disseminated.
  • The NIH biomedical research enterprise has substantially exceeded this goal when, in addition to fulfilling the above criteria, any of the following apply:
    • —Discoveries result in significant new understanding.
    • —Research yeilds answers to long-standing, important questions.
    • —Genome information about humans, model organisms, or disease-causing agents is translated into new understanding of the role of genes or the environment.
    • —Discoveries have potential for translation into new or improved technologies, diagnostics, treatments, or preventive strategies.
  • It was also explicitly pointed out to the working group that a third level of performance—the goal was not met—was also possible and could be considered.
1.4.4. Specifying results.

The compilation of written materials for goal A was by far the largest of that for any goal, totaling 265 items. For this goal, the woirking group concluded that NIH had “substantially exceeded” the goal of “adding to the body of knowledge.” Specially, the working group concluded that the outcomes demonstrated that NIH had “substained the excellence and responsiveness of the research system while demonstrating willingness to take research risks necessary to advancing biomedical knowledge and, ultimately, human health.”

In all, the group judged that for FY1999, NIH had “substantially exceeded” four goals and “met” one goal. The goal that “lagged” somewhat was goal C, “Develop new or improved approaches for preventing or delaying the onset or progression of disease and disability.”

1.4.5. COSEPUP criteria.

At the workshop, there was little discussion of one of the three COSEPUP criteria for evaluating research—that of leadership. The other two criteria—quality and relevance to mission—were either discussed at length (quality) or embedded in the peer-review process (relevance to mission). Leadership concerns the relative level of research being performed in a given program relative to the highest world standards of excellence. COSEPUP has suggested, and tested, the use of “international benchmarking” to measure leadership, a technique discussed in its full report.C2-4

1.5. How does the selection and evaluation of projects relate to the evaluation of the research program?

Selection criteria were developed by NIH on the basis of the decision to aggregate its individual research projects and to evaluate them as part of broad biomedical goals. The objective of these criteria was to capture the results of clinical, as well as basic, research. NIH staff held many roundtable discussions, conferences with stakeholders, and cross-agency planning sessions to gather input from all groups. They used the National Association of Public Administration (NAPA) as a forum.

1.5.1. Research as the primary mission.

NIH is in the midst of a planning-while-doing process to find the best way to evaluate a research-dominated mission. Most mission-based agencies—such as the Department of Defense, the Department of Energy, and the National Aeronautics and Space Administration—spend only a small fraction of their budgets on research, and their missions are described in terms that are not restricted to research (such as maintaining the national defense or exploring the solar system). NIH, in contrast, like the National Science Foundation, has research as its primary mission and allocates its budget accordingly. Therefore, the evaluation of its “performance and results” is primarily a matter of evaluating the research effort itself.

1.5.2. Reviewing basic research.

The reasons for this approach are derived from the unique challenges for agencies whose missions include basic and clinical research. As proposed in NIH's report, Assessment of Research Program Outcomes, scientists and the practice of science “exist because of what we do not know. The aim of science is to move the unknown into the realm of the known and then, with a greater store of knowledge, begin again, as if advancing a frontier. This basic truth about science makes it different from other enterprises.”

Because it is impossible to know with certainty which field or project of research will produce the next important discovery, the assessment report continues, the community of science “has to be open to all ideas.” Many of these ideas will lead to useful outcomes; many others will not. Although much NIH funding supports research projects that are of obvious relevance to specific diseases and public health, it also places a high priority on fundamental, untargeted research. History has shown many times that a basic-research finding might be a critical turning point in a long chain of discoveries leading to improved health and richer science. However, although these basic research programs can be evaluated usefully on a regular basis, the ultimate outcomes of fundamental research are seldom predictable or quantifiable in advance.

1.5.3. Dealing with unpredictability.

According to the NIH assessment report, unpredictability has three important implications:

  • Science is by nature structured and self-correcting, so that either a predicted or an unforeseen discovery has the advantage of adding to basic scientific knowledge and giving new direction to further inquiries.
  • Science and its administrators must constantly reevaluate and often change their priorities in light of new discoveries.
  • Tracking the many aspects of fundamental science is a daunting challenge that must capture quantitative, qualitative, and institutional dimensions... . It is normal and necessary for basic research investigators to modify their goals, change course, and test competing hypotheses as they move closer to the fundamental understandings that justify public investment in their work. Therefore it is necessary to evaluate the performance of basic research programs by using measures not of practical outcomes but of performance, such as the generation of new knowledge, the quality of the research performed, and the attainment of leadership in the field.

In addition, the annual reporting requirements of GPRA present a problem. The outcomes of fundamental science commonly unfold over a period of years. During a given period, it might or might not be possible to quantify progress or predict the direction of the outcome.

2. How is the result communicated to different audiences (e.g., S&T community, advisory committees, agency leadership, Administration, Congress)?

An agency's response to GPRA can enhance communication both internally and externally. Internally, the exercise at NIH has focused attention on how the institutes manage their activities. It has required a common dialogue in the parent agency, DHHS. It forces both the agency and NIH to set goals and be accountable. NIH's partnership with other agencies, such as NSF and its association with NAPA has brought all participants a somewhat better understanding of Congress's interest in GPRA, and it will be used.

2.1. Toward common nomenclature.

One impediment to internal communication in the past has been the use of different standards and nomenclature. GPRA encourages common standards and nomenclature during all phases of the process. Improved communication between biomedical disciplines is more important today than in the past. A generation ago, there was a clear line between research and care. Today, the pace of research is greater, and the line between research and care is blurred. For example, virtually all pediatric cancer care now uses research protocols, as does a growing proportion of adult cancer care.

2.2. Communicating with oversight bodies.

A primary audience is Congress. It is still unclear how GPRA information is being used by the authorizing committees or, especially, the appropriations committees, but the House Science Committee has been an active participant in planning and overseeing implementation of GPRA. Another important audience is GAO, which oversees many aspects of government performance. Finally, OMB is both a participant and an audience in the GPRA process for NIH. OMB receives the budget requests and performance plans, engages with the agency, and asks questions about how well the plan reflects the agency's priorities. OMB has learned, along with NIH, how research presents different challenges for evaluation. OMB reports a generally favorable opinion of NIH's aggregation plan.

3. How is the result used in internal and external decision-making?

The GPRA process has greatly facilitated internal decision-making by bringing groups together and establishing linkages throughout DHHS. When the working groups were established, new contacts and interoffice relationships were built. People learned how different institutes, centers, and agencies had different approaches to planning and reporting. Groups were able to look at different plans and understand them. They also learned how there could be a combined plan and report.

DHHS is attempting to use the results to improve the linkage between performance plans and budget. The linkages are not made dollar for dollar, especially in research, but the information gathered for GPRA is useful to help make decisions earlier in the budgeting process. Performance plans are also used by the DHHS budget review board, which has made a commitment to using GPRA. Ultimately, planners hope to use it more explicitly for budgeting, and even for internal management.

3.1. Linkage with the budget.

The use of GPRA activities to feed back into the budget process is complex: how to get from the appropriations process (on an institutional basis) to how each institute and center spends money. Many players are involved in budgeting, including each institute's director, the NIH director, the White House, OMB, Congress, and the President. The appropriations committees have the final decision on budget amounts. Once the money reaches NIH, it must decide how to allocate it. On a grassroots level, individual investigators (both extramural and intramural) guide the disbursement at the micro-level by proposing promising ideas for research. NIH also advertises areas for which it solicits more research (for example, in diabetes, it might call for more work on islet-cell transplantation). The Institutes also try to balance their disbursements between small grants and large projects, between basic and clinical research, and between research and instrumentation or other infrastructure.

The experience with GPRA is too brief to allow NIH to place a value on the process. There has been only one assessment report, and a second is due in spring 2001. GPRA's effect is also hard to discern amid other forms of input. At least, it has proved to offer another avenue of feedback and evaluation.

3.2. Decisions about goals not met.

Failure to meet selected goals would not necessarily trigger a shift in resource allocations. For example, if a target was not met because it was too ambitious, the problem could exist in the target-setting, rather than in inadequate funding or poor execution. For example, NIH set a target for facility construction last year that was not met. The target had been set 2 years before. The reason it was not met was that it turned out to be more cost-effective to add a floor to the building during construction, thereby extending the completion date beyond the initial target. Such situations are not clearly dealt with in some GPRA reporting mechanisms.

3.3. Changing the process.

NIH has made several changes in the assessment process as a result of the previous assessment:

  • It has added a co-chair of the working group so that each year's chair will have had the experience of a previous cycle.
  • The working group has been expanded (to 34 people) to add expertise and ensure sufficient coverage when some cannot attend.
  • Specific review assignments were made to facilitate the assessment process.
  • Discussion of each individual's and the group's collective assessment by goal was conducted during the meeting, rather than following the meeting by ballot.

NIH Focus Group Participant List October 25, 2000

Panel Members:

Enriqueta Bond

President

The Burroughs Wellcome Fund

Research Triangle Park, North Carolina

Alan Schriesheim

Director Emeritus

Argonne National Laboratory

Argonne, Illinois

Brigid L.M. Hogan

Investigator, Howard Hughes Medical Institute and Professor,

Department of Cell Biology

Vanderbilt University School of Medicine

Nashville, Tennessee

Max D. Summers

Professor of Entomology

Texas A&M University

College Station, Texas

Bailus Walker, Jr.

Professor, Environmental & Occupational Medicine

Howard University

Washington, D.C.

Participants:

Theodore Castele, M.D., FACR

Member, NIH GPRA Assessment

Working Group

Fairview Park, Ohio

Melanie C. Dreher

Dean and Professor

The University of Iowa College of Nursing

Iowa City, Iowa

Marc Garufi

Program Examiner, Health Programs and Services Branch

Office of Management and Budget

Washington, D.C.

Joanna Hastings

Program Analyst, Office of Budget / ASMB

Department of Health and Human Services

Washington, D.C.

Genevieve Knezo

Congressional Research Service

Library of Congress

Washington, D.C.

Robin Nazzaro

Assistant Director

US General Accounting Office

Washington, D.C.

Robert Roehr

Member, NIH GPRA Assessment

Working Group

Washington, D.C.

Arthur Ullian

Member, NIH GPRA Advisory Committee

National Council on Spinal Cord Injury

Boston, Massachusetts

Agency Representatives:

Robin I. Kawazoe

Director, Office of Science Policy & Planning

National Institutes of Health

Bethesda, Maryland

Lana R. Skirboll

Associate Director for Science Policy

National Institutes of Health

Bethesda, Maryland

John Uzzell

Director, Division of Evaluation

Office of Science Policy

National Institutes of Health

Rockville, Maryland

Appendix C-3. SUMMARY OF THE NATIONAL AERONAUTICS AND SPACE ADMINISTRATION FOCUS GROUP

1. What methodology is used for evaluating research programs under GPRA?

1.1. Overview.

Like other federal agencies that support significant programs of science and engineering research, the National Aeronautics and Space Administration (NASA) has encountered several difficulties in evaluating research programs to the specifications of GPRA. Compliance with GPRA did not require new mechanisms for assessing performance; these were already in place. But it did require a new overlay of reporting requirements on an annual basis.

For its first performance report, in FY1999, the agency assessed its performance in terms of goals (e.g., “solve mysteries of the universe”), multiple objectives (e.g., “successfully launch seven spacecraft, within 10 percent of budget, on average”), and targets achieved (e.g., “several spacecraft have been successfully developed and launched with a 3.8 percent average overrun”).

For its FY2001 performance plan, NASA has instituted several major changes. The targets have been changed in an effort to relate the specific annual measures of output (now called “indicators”) to the eventual outcomes that usually result from a number of years of activity. By using the new targets, NASA hopes to have better success in relating multiyear research programs to yearly budget and performance reports. For example, under the strategic plan objective “solve mysteries of the universe” are three “targets” (e.g., “successfully develop and launch no less than three of four planned missions within 10% of budget and schedule”). Under this target are a series of specific performance indicators, such as the successful launch of the microwave anisotropy probe.

1.1.1. Internal and external reviews.

To take a broader view of NASA's evaluation techniques, the agency uses extensive internal and external reviews to assess its research efforts against its performance plans. Internal reviews include standard monthly and quarterly project- and program-level assessments at NASA centers, contractor sites, and NASA headquarters. There are reviews of science, engineering, and technology plans and performance, in addition to reviews of functional management activities linked to research, such as procurement, finance, facilities, personnel, and information resources. Management councils conduct oversight reviews of schedules, cost, and technical performance against established plans and bring together headquarters and field directors twice a year for assessment reviews of enterprise and cross-cutting process targets.

When GPRA was introduced, NASA management decided that existing review processes were sufficient to provide internal reporting and reviewing of project and program performance data. The recent streamlining of agency processes provided confidence that new data-collection and oversight processes need not be created for compliance with GPRA.

For external review, NASA relies on National Science Foundation-style peer review of its activities by outside scientists and engineers, primarily from universities and the private sector. Panels of scientific experts are asked to ensure that science-research proposals are selected strictly on their merits. “Intramural” (at NASA facilities) projects in the research programs are selected in the same competitive processes as extramural (e.g., at universities) projects. Competitive merit review is applied to over 80% of resources awarded.

Additional reviews are conducted by such organizations as the NASA Advisory Council (including representatives of universities, industries, and consulting firms), the National Research Council, and the General Accounting Office.

For the purposes of complying with GPRA, NASA relies on its own advisory committees for its primary input. These committees are already familiar with NASA's strategic plan, individual enterprise strategic plans, and budget.

1.1.2. The need to tailor GPRA to particular areas.

At the workshop, NASA devoted about half its time to presenting the research programs in the three strategic enterprises with science missions, and the GPRA responses of those specific research programs: space science, earth science, and biologic and physical science. Each has different requirements, and each should have its own methods for complying with GPRA. That is especially true in evaluating internal programs and setting internal priorities. If reviews are effective at allowing the reallocation of dollars, they must reach below the program level to comprehend individual projects. Agencies must stimulate internal discussions among divisions so that peer-review systems can be carefully tailored to match the needs of specific areas.

1.1.3. The need to evaluate technologic activities.

An essential part of the “science cycle” for NASA is a wealth of technologic activities that include theoretical studies, new-instrument development, and exploratory or supporting ground-based and sub-orbital research that are intended to help accomplish scientific research objectives. These technology programs, as integral parts of NASA's overall research effort, must be evaluated for GPRA with the same transparency and rigor as its “science” products. In addition, reviewers should assess the effectiveness with which these technologic activities are integrated with “scientific” complements.

1.1.4. Additional challenges.

Evaluating research programs under GPRA presents other significant challenges to the space agency. One reason is that it must deal with 3 years of planning and evaluating simultaneously. For example, NASA is currently developing the performance plan for the budget planning year (02), tracking the current plan for the current budget year (00), and preparing the performance report for the completed fiscal year (01).

The development of metrics is also complicated by the issue of lead time. In NASA's earth-science programs, for example, 16 months pass between the time when targets are submitted and the time of implementation; 28 months pass between target submission and final results for the performance report. During those periods, the ideas and basis of the research program often change substantially, forcing alterations of individual metrics and perhaps even of the larger goals of the program.

Finally, perhaps the most difficult challenge is to develop an appropriate response to GPRA's focus on outcome metrics. Historically, NASA is accustomed to tracking the technical status and progress of its complex programs, and accurate tracking is integral to the success of its missions. For flight programs, such as the International Space Station, NASA engineers use many thousands of technical metrics to track performance, schedules, and cost.

For long-term research programs, however, such technical metrics might not adequately convey the quality or relevance of the work itself. For example, in the space-science objective to “solve the mysteries of the universe,” the assessment process requires a multifaceted judgment that takes into account the nature of the challenge to “solve the mysteries,” the level of resources available, and the actual scientific achievements of the year. That judgment cannot be achieved solely by comparing the number of planned events for the year with the number of events that were achieved.

This issue will be discussed at greater length in Section 1.5 below.

1.1.5. Overall performance.

For the purpose of assessing NASA's performance at the enterprise and cross-cutting process levels, reviewers must integrate quantitative output measures and balance them with safety, quality, performance, and appropriate risk. The advisory committees will be asked to assign a rating of red, yellow, or green to measure the progress made against each of the objectives and provide a narrative explanation. These objectives are identified in the strategic plan and repeated in the display of strategic goals and strategic objectives.

1.2. What level of unit is the focus of the evaluation?

NASA divides its activities into five “enterprises”: the Space Science Enterprise, the Earth Science Enterprise, the Human Exploration and Development of Space Enterprise, the Biological and Physical Research Enterprise,C3-1 and the Aero-Space Technology Enterprise. Each enterprise then evaluates is mission by several strategic plan goals. Each goal, in turn, has several strategic plan objectives; each objective has one or more targets for the fiscal year, and each target is measured by one or more indicators, as described above.

For the GPRA assessment of the Space Science Enterprise, for example, there are three components:

  • Mission development. This component has about 20 specific targets, from successful launches to specific missions. Each is reviewed for progress in design or for success in bringing technologic development to a certain level.
  • Mission operations and data analysis. Independent outside reviewers are asked to evaluate the NASA program with regard specifically to strategic plan goals and generally to how the agency contributes to space science as a whole.
  • Research and data analysis. This component uses independent outside reviewers on a triennial cycle, which is more appropriate than an annual cycle for research. NASA's space science research programs receive about 2000 proposals per year for research grants; of those, it selects 600-700. The proposals are screened with traditional peer review. In addition, the agency has begun an additional layer of expert review called senior review, as recommended by COSEPUP in its GPRA report of February 1999.C3-2 For this review, instead of looking at 2000 awards in 40 disciplines, NASA has grouped all projects in nine science clusters. Reviewers look at highlights of the clusters and examine recent research accomplishments that meet strategic-plan goals. They also review work in progress that is designed to meet long-term goals. For example, future space missions will require new forms of imaging that must be supplied by basic research in optics; reviewers will monitor NASA's progress in optics research toward this goal.

The Space Science Enterprise initiated a planning process for this mechanism 18 months ago. The first triennial senior review will be held in the middle of 2001 to fit in with the strategic-planning process.

1.3. Who does the evaluation of the research program under GPRA?

Many standing and ad hoc groups participate in the evaluation process. At the enterprise level, the target “owners” are asked for the most appropriate indicators to use as metrics. These metrics are reviewed by the independent NASA Advisory Council. The GPRA coordinators take this input, integrate it with the rest of the performance plan, and send it to the Office of Management and Budget (OMB).

Oversight of the GPRA process for the NASA science research programs is the responsibility of the NASA Science Council and interagency working groups. The Science Council is an internal group composed of the chief scientist, chief technologist, CFO, and other members of the headquarters leadership. (Because of the relationship between the budget and the performance plans, the NASA CFO has primary responsibility for the conduct of the performance plan and reporting process.)

Several external groups also help to guide the process. The Space Studies Board of the National Research Council performs guidance and evaluation. Other participants include the Institute of Medicine Board on Health Science Policies and the Aeronautics and Space Engineering Board and National Materials Advisory Board of the National Research Council.

1.3.1. An example of the evaluation process.

For FY2002 performance-target development, NASA headquarters transmits guidelines on targets and goals based on GPRA, OMB Circular A-11, and the Congressional Scorecard. The lead NASA centers develop performance targets with additional guidance from advisory committees. Headquarters reviews the targets and develops specific plans. These plans and targets are included in program commitment agreements between the administrator and associate administrator. They are reviewed internally and then by OMB. During the fiscal year, progress toward the targets is reviewed at least quarterly by the NASA Program Management Council in light of both budget development and strategic plans. Final review is conducted by the advisory committee, and the result is a grade of red, yellow, or green.

Peer-review research at the project level does not appear on GPRA plans or reports, because of the great number of projects involved. Nonetheless, it is the fundamental mechanism for maintaining the quality of NASA's research programs. Disciplinary working groups are responsible for overseeing the peer-review process for each discipline and for developing milestones.

For example, in the Earth Sciences Enterprise, essentially all scientific research is peer-reviewed through the use of mail or panel reviews. In FY1990, nearly 5000 mail reviews were received, and peer review panels in FY1999 and FY2000 involved nearly 300 people. Also, NASA Earth Science Enterprise programs have extensive interaction with the community, including National Research Council panels, US Global Change Research Program, and international science organizations (such as World Climate Research Program and EGBP).

1.3.2. Who is “external”?

Some aspects of peer review were debated at the workshop, notably the need to use reviewers who are able to evaluate a project objectively. As a rule of thumb, NASA prefers a panel of whom one-third or more are not currently funded by NASA.

1.4. What criteria are used for the evaluation?

As mentioned above, GPRA requires a heavier focus on outcome metrics than on NASA's common input and output metrics. For example, OMB Circular A-11 states that performance metrics must be measurable, quantifiable, reflective of annual achievement, and auditable. Like other agencies that support research, NASA has difficulty in finding such performance metrics for research programs and in relating multiyear projects to the annual budget process.

Workshop participants discussed this issue in detail. To distinguish the two terms, an example of an output might be a workshop or a launch—a deliverable that might or might not have value beyond the simple performance of the task. An outcome, in contrast, would be evidence that a workshop increased knowledge or new science enabled by data obtained from a NASA payload in orbit—concrete benefits or results.

1.4.1. Measuring outcomes of research.

Because the outcomes of most research programs are not clear for several years, especially those requiring launching, the effort to report outcomes can lead to the use of numbers that mean little with respect to the new knowledge hoped for. Conversely, a program might report successful outputs (e.g., preparation of experiments for launch) that are nullified if the launch fails or is postponed. In other words, it is possible to meet the indicators and miss the target—or to miss the indicators and still learn a great deal about the target objective.

1.4.2. A plan for expert review.

For those reasons, NASA is planning to change its reporting process for FY2002. The agency is now evaluating the changes, discussing them internally, and gauging how they will apply to the GPRA process. The struggle is to quantify “intangible” results, such as knowledge. Most government programs have a product that is easy to describe, including many NASA missions. But when knowledge is the objective, its form is unknown, and its discovery is often serendipitous. That kind of objective defies the use of conventional metrics.

Hence, the new process makes use of expert review of the research-program components to attempt a more meaningful approach. NASA will continue to report annual GPRA-type metrics for enabling activities, such as satellite development, as well as annual science accomplishments in the GPRA performance report. It would review one-third of the research program annually, providing regular scrutiny. It would need to ensure a review of the degree of integration within research and the connection of the research to applications and technology. Many NASA centers already do this, but it has not been enterprise-wide, and NASA will have to get its budget office to approve it. Originators of this approach believe that the scientific community will show far more enthusiasm for evaluating research programs with expert review than for evaluation according to annual measures and results. The experience of centers that have used expert review is highly favorable.

This would relieve several major concerns about the past method. One is concern that when the importance and relevance of a program are defined in terms of metrics, a program considered unmeasurable or difficult to measure could lose priority in the budget process relative to programs that are easier to quantify. Similarly, unmeasurable or difficult-to-measure programs give the perception that their progress and ability to produce useful results are not being tracked regularly. The use of expert review to track program performance could be accurately reflected in the performance report.

1.4.3. A plea for change.

COSEPUP, in its 1999 report on GPRA, suggested that “there are meaningful measures of quality, relevance, and leadership that are good predictors of eventual usefulness, that these measure can be reported regularly, and they represent a sound way to ensure that the country is getting a good return on its basic research investment.”

1.5. How does the selection and evaluation of projects relate to the evaluation of the research program?

One way to describe the relationship between the selection and evaluation of projects and the evaluation of the research program is in terms of relevance. That is, a research project should be selected only if it is relevant to the long-term goals of the research program. That is the difference between doing research in support of the mission and doing good science just for its own sake. Because NASA is by definition a mission agency, all its work is justified by its relationship to its missions. During reviews, panels are asked how each research area supports the agency's science goals.

1.5.1. “Moving toward uncertainty.”

There are difficulties in trying to evaluate projects by quantitative measures in light of how science is performed. The use of milestones, for example, implies a one-directional progression toward a goal. One who moves in this way is moving “toward certainty”—toward the proof of an expected conclusion—and scientists feel considerable pressure in their projects to “reduce uncertainty.” Science, however, does not always develop in the expected direction, and the way to new understanding often means “moving toward uncertainty” in a project. For example, the discovery of the Antarctic ozone hole was disputed at first because the atmospheric models of the time did not predict it. The way toward discovery was toward uncertainty. The theorists had to go back and revise their models in the face of a fundamental advance. If the response to GPRA involves an excessive dependence on metrics, it could dissuade agencies from accepting uncertainty and moving toward new ideas.

1.5.2. A need for flexibility.

Once a strategic goal is decided, there should be flexibility to move in new directions if the present direction proves unproductive. Such decisions should benefit from input from the scientific community.

1.5.3. The issue of control.

Results of some programs might be out of NASA's control, such as educational, scientific, and commercial outreach. In educational outreach, for example, outcomes depend on the educational process itself. It should be assumed that good material will be used.

In the case of launch data, there is a concern that it will be decoupled from science in mission objectives because it is easy to quantify. There are other phases of the program, such as design review, that should be metrics, and their success should not depend on the actual launch, which is always subject to slippage. For example, the Terra launch slip led to six FY1999 targets not being met. Similarly, research partnerships should be evaluated in ways such that NASA's lack of control in partner-led areas does not unduly prejudice the results.

2. How is the result communicated to different audiences (e.g., S&T community, advisory committees, agency leadership, Administration, Congress)?

One of the goals of GPRA is to allow the various agencies and stakeholders to develop common nomenclature to deal with the evaluation of research. In addition, GPRA criteria should allow the agency to retain some flexibility and not place it in a straitjacket in its dealings with Congress and other oversight bodies. GPRA reports, in general, must be understandable to a wide array of people, but compliance requirements must recognize the need for technical discussion to capture the full reasoning behind the science.

2.1. Explaining the rationale for research.

GPRA documents have to explain to committees why particular things are done. One example is the goal of putting a spacecraft into an orbit 50 km above the surface of an asteroid. One could explain this goal by stating the simple metric that such a goal can allow photography of the asteroid's surface with 1-m resolution, but a qualitative rationale might be more appropriate. For example, a 50-km orbit is desirable because it lets us see the surface well enough to understand the internal process of the asteroid that influenced the formation of the surface.

2.2. Freedom to change course.

Congress might benefit from additional knowledge about the give-and-take of the scientific funding process. For example, it is common knowledge in the scientific community that principal investigators often change course from the plans they outline to their funding agencies. Such course changes are almost inevitable in pursuing the unknown. They do not indicate failure or willful disregard for the contract with a funding agency. Rather, the pursuit of a new direction is an indication of the “moving toward uncertainty” described above—the evolutionary process that leads to new knowledge.

In addition, NASA should take advantage of its strength in communication to better explain to the public what makes NASA unique, such as its effectiveness in interdisciplinary research and its ability to establish metrics for complex, interdisciplinary programs.

2.3. Communicating with the public.

Several participants congratulated NASA on the fullness and diversity of its communication with the public, including research publications, data archives, and Web sites. The agency has made efforts to increase the public's access to knowledge generated by NASA through exhibits, interviews, and news articles. It also assists in the location and retrieval of NASA-generated knowledge through help desks, image-exchange programs, and the Web site. Participants urged even more efforts like those to communicate the kinds of results sought by GPRA.

3. How is the result used in internal and external decision-making?

The desired result of the GPRA response is to make clearer to the public how government funds are being used to benefit the public. GPRA is also intended to be used by Congress to facilitate oversight activities and to make budget decisions, although GPRA has not yet been used for budgeting purposes.

3.1. The question of internal change.

The agency discussed at some length how these descriptions and their judgments can be used internally. At this stage in the evolution of the act, the GPRA “overlay” of NASA's extensive review mechanisms has not yet brought about new mechanisms for program change, although participants felt that there should be consequences.

3.2. Unhelpful comparisons.

Several participants expressed the concern that annual GPRA performance evaluations can lead to misunderstandings of the performance and value of long-term R&D. The present GPRA process generates expectations that “value-added outcomes” that benefit the American public should be reached each year by every research program. That is, a dollar of investment should earn at least a dollar of return, like a savings account. In fact, a research portfolio is more like the stock market, featuring many short-term ups and downs and the occasional long-term “home run.” An effective GPRA reporting process would better communicate the high-risk, high-reward nature of research and provide convincing evidence of its value and continuing contributions to society.

NASA Focus Group Participant List October 30, 2000

Panel Members:

Alan Schriesheim

Director Emeritus

Argonne National Laboratory

Argonne, Illinois

Wesley T. Huntress, Jr

Director, Geophysical Laboratory

Carnegie Institution of Washington

Washington, D.C.

Louis J. Lanzerotti

Distinguished Member of the Technical Staff

Bell Laboratories, Lucent Technologies

Murray Hill, New Jersey

Herbert H. Richardson

Associate Vice Chancellor of Engineering and Director, Texas Transportation Institute

The Texas A&M University System

College Station, Texas

Participants:

Nicholas Bigelow

Department of Physics and Astronomy

The University of Rochester

Rochester, New York

Erin C. Hatch

Analyst in Space and Technology Policy

Congressional Research Service

Library of Congress

Washington, D.C.

Allen Li

Director, Acquisition and Sourcing Management

US General Accounting Office

Washington, D.C.

Robin Nazzaro

Assistant Director

US General Accounting Office

Washington, D.C.

Agency Representatives:

Rich Beck

Director, Resource Analysis Division

NASA Headquarters

Washington D.C.

Ann Carlson

Assistant to the NASA Chief Scientist

NASA Headquarters

Washington, D.C.

Phil Decola

NASA Headquarters

Washington, D.C.

Jack Kaye

Director, Earth Sciences Research Division

NASA Headquarters

Washington D.C.

Guenter Riegler

Director, Space Sciences Research Program Management Division

NASA Headquarters

Washington, D.C.

Eugene Trinh

Director, Physical Sciences Research Division

NASA Headquarters

Washington, D.C.

Appendix C-4. SUMMARY OF THE NATIONAL SCIENCE FOUNDATION FOCUS GROUP

1. What methodology is used for evaluating research programs under GPRA?

1.1. Overview.

The National Science Foundation (NSF) is an independent agency of the US government, established by the NSF Organic Act of 1950 to “promote the progress of science; to advance the national health, prosperity and welfare; and to secure the national defense.” NSF is governed by the National Science Board (24 part-time members) and a director, with a deputy director and assistant directors.

Its mission is unique among federal agencies in that it supports only extramural research, conducted primarily by principal investigators and groups in universities and other institutions. Other agencies, such as the Department of Defense, support large research programs, but the research components receive only a minor fraction of those agencies' budgets. At NSF, virtually the entire $4 billion budget is devoted to research (minus a portion—5% to 6%—spent to administer grants and awards).

1.1.1. Special challenges in complying with GPRA.

Because of its unique charter, NSF faces special challenges in complying with GPRA. The first is that it has only limited control over the extramural research it funds. The agency relies on the advice of external, independent peer reviewers to evaluate the 30,000 applications received each year, of which about 10,000 are awarded.

A second challenge is that the progress and timing of research results are seldom predictable or measurable. Awardees may change the direction or emphasis of research projects as they discover new opportunities or knowledge.

Third, projects funded by NSF unfold over multiyear periods, and their results usually do not synchronize with the annual reporting requirements of GPRA. The agency has not found a way to provide annual evaluations of projects that have longer-term objectives.

A fourth challenge is a fundamental tension between the NSF organic mission and GPRA's requirement for quantitative metrics to evaluate research programs. Most research, especially basic research, is seldom measurable in quantitative ways. Potential impacts are difficult to predict and require long time frames. It is difficult to attribute specific causes and effects to final outcomes.

A fifth challenge is to comply with GPRA's objective of correlating performance goals and results with specific budgetary line items. Because of the timing of NSF grants and the progress of research, NSF cannot predict what the results of its programs will be in a given year.

Other challenges are finding a large number of experts who are qualified and independent, attributing success to a project that has multiple sources of support, and avoiding overconservative project selection that could inhibit the high-risk science that leads to high rewards.

To address those challenges, NSF has adopted an alternative reporting format, as approved by the Office of Management and Budget (OMB). The format relies on a mixture of quantitative and qualitative measures and relies primarily on expert review at the project and program levels. Within this format, specific research projects are monitored annually, and the progress of research programs is evaluated retrospectively every 3 years.

1.2. What level of unit is the focus of the evaluation?

For purposes of GPRA, NSF views its entire basic-research enterprise as a single “program.” It has chosen this route in part because including a discussion of its individual research projects (some 10,000) or even individual research programs (about 200) is not practical. That is one reason, as several participants pointed out, why NSF's “results” cannot be matched with budgetary line items.

The core of the research enterprise is the individual research project; most of them are university-based. Some 95% of these projects are merit-reviewed before funding and then reviewed annually for progress by NSF staff. The merit-review (or expert-review) process continues under GPRA, although it does not appear specifically in GPRA reporting, because of the huge number of projects.

1.2.1. Outcome goals.

The 200 agency-wide programs include directorate and cross-directorate programs, NSF-wide initiatives, small-business awards, the award program for individual investigators, and grants for small and large facilities. The activities of all those programs are included in evaluating broad outcome goals for the agency. For example, the FY1999 GPRA outcome goals listed by NSF are the following:

  • Discoveries at and across the frontier of science and engineering.
  • Connections between discoveries and their use in service to society.
  • A diverse, globally oriented workforce of scientists and engineers.
  • Improved achievement in mathematics and science skills needed by all Americans.
  • Timely and relevant information on the national and international science and engineering enterprise.

For the first (and dominant) goal of “discoveries,” NSF asked its reviewers to award one of two grades for FY1999: successful and minimally effective. Performance was to be judged “successful” when NSF awards led to important discoveries, new knowledge and techniques, and high-potential links across disciplinary boundaries. Performance was to be judged “minimally acceptable” when there was a “steady stream of outputs of good scientific quality.” Officials found, however, that this combination of grades was not helpful to its reviewers, and for the FY2000 NSF has replaced these categories with: successful and not successful.

1.3. Who does the evaluation of the research program under GPRA?

NSF depends on two populations of reviewers to evaluate its programs. At the “grass roots” level, some 95% of individual research projects are approved and reviewed by independent expert reviewers (a small number are initiated internally by the director or others). These reviewers provide what an NSF representative called a “high-level front-end filter” for agency-supported research. Each project is reviewed annually and reviewed every 3 years for integrity and progress. This level of reviewing falls below the aggregation level of agency-wide GPRA reporting.

1.3.1. Committees of Visitors.

At the much higher program level, NSF relies on its traditional external Committees of Visitors (COVs) to review integrity of process and quality of results of program portfolios every 3 years. These committees include people who represent a balanced array of disciplines, fields, and activities affected by outputs or outcomes of NSF-supported work and members of industry, government agencies, universities, foreign research communities, and other potential users. Each must be “credible” and “independent” (although independence is often difficult to judge in fields where many of the experts rely on a small number of funding sources and employers). Approximately 20 COVs meet each year to assess 30% of the NSF portfolio.

1.3.2. Advisory Committees.

At the highest level are the directorate advisory committees, whose members are selected not only for expertise and perspective, but also for diversity of point of view. They are asked to review activities at the directorate and cross-directorate levels. Each advisory committee submits a report assessing the directorate each year.

For the GPRA report itself, NSF uses reports from each directorate's advisory committee. It combines those with COV reports (42 were submitted in 1999) to prepare an NSF-wide report. Each directorate may also use as input information for COVs and advisory committees, individual project reports (as examples or “nuggets” of high-quality research), budget justifications, press releases, annual-report materials, and National Research Council or other reports on the status of work supported by NSF.

1.4. What criteria are used for the evaluation?

In a broad sense, NSF relies on multiple criteria in evaluating programs. These include peer (expert) review of proposals, past performance of the principal investigator or group, community input, and input from the scientific community and public. Both qualitative and quantitative criteria may be used as tools.

1.4.1. Merit and impact.

At the nitty-gritty level of proposal evaluation, reviewers are asked to look at two primary criteria:

  • What is the intellectual merit of the proposed activity?
  • What are the broader impacts of the proposed activity?

Out of this evaluation come two results. The first is advice as to whether to fund a proposal, and the second is a suggestion of the size of the award.

1.4.2. Process and results.

The COVs are asked to evaluate both the process and the results of research programs. COVs provide NSF with expert judgments of the degree to which outputs and outcomes generated by awardees have contributed to the attainment of NSF's strategic goals. They also assess program operations and management.

The advisory committees are asked to review the reports of the COVs and to take a broader view of agency activities. Their reviews include specific and general guidance for program managers and are intended to influence practice at the managerial level.

An important criterion in evaluating any NSF program is the extent to which it promotes development of human resources. This criterion is stated in the NSF Act, which directs the agency to support “programs to strengthen scientific and engineering research potential.” NSF has goals to promote the development of human resources within the agency and in the scientific community.

1.4.3. Quality, relevance, and leadership.

As suggested in the original report of the Committee on Science, Engineering, and Public Policy (COSEPUP) on GPRA, NSF uses the criteria of quality and relevance in its evaluations. It focuses less attention on the third criterion, leadership, although its expert reviewers often take leadership status into account. They have not, however, found a way to assess leadership through international benchmarking, as proposed by COSEPUP.

1.4.4. A mix of qualitative and quantitative means.

COSEPUP also addressed the issue of whether to rely more heavily on qualitative or quantitative means to evaluate research. It suggested that basic research can best be evaluated by expert review, which makes use of quantitative measures wherever appropriate. NSF uses a mix of both, depending on the material being reviewed. For the outcome goals and results of research, qualitative measures are used. For “investment process goals,” a mixture of qualitative and quantitative means are used. For management goals, quantitative means predominate.

1.5. How does the selection and evaluation of projects relate to the evaluation of the research program?

Because NSF evaluates its “research program” on an agency-wide basis, the evaluation for the purpose of complying with GPRA is not directly related to the selection and evaluation of individual projects. As suggested above, the 10,000 or so projects selected each year cannot be discussed individually in any meaningful way for a single report.

1.5.1. Differences of scale.

At the same time, the selection and evaluation of projects do form the heart of NSF's activities, and the nature of research is central to everything it does. Yet, OMB does not expect each project to be evaluated under GPRA. The agency-wide evaluation is performed on a different scale from a single-project evaluation, but by the same principles. Hence, NSF's GPRA performance plan for FY2001 includes the statements that “even the best proposals are guesses about the future,” “true research impact is always judged in retrospect,” and “the greatest impacts are often ‘non-linear' or unexpected.”

In contrast, some of NSF's projects are easily quantifiable and are evaluated on that basis. For example, the Laser Interferometer Gravity-Wave Observatory near Hanford, Washington, is the agency's largest investment. It was delivered on time and under budget, and the program as a whole was judged a success from that perspective. But from a research perspective, the agency cannot yet know whether the observatory will detect gravitational radiation and bring new knowledge to the world.

1.5.2. An important disconnect.

In one important respect, the selection of projects is disconnected from the evaluation of the research program. NSF advisory committees must submit information for its assessment in September, before the books close on the current year.

Similarly, the agency is about to begin work on the performance plan for 2002, but it does not yet have the report for 2000 to know where it should be making changes for 2002.

An NSF representative said, “In addition, it may take a year or two to put a solicitation out, receive proposals, evaluate them, and make awards. Because most awards cover five years, the time lag between putting out a request for proposals and meaningful results may be six or seven years (or more).”

2. How is the result communicated to different audiences (e.g., S&T community, advisory committees, agency leadership, Administration, Congress)?

The issue of “transparency” in GPRA reporting was discussed at length. On the one hand, NSF has received high praise for communicating openly with its many stakeholders, including Congress, OMB, the Office of Science and Technology Policy, NSF's National Science Board, NSF advisory committees, the National Academies, S&E professional societies, the academic community, and the general public (partly through the NSF Web site).

2.1. The issue of COV reports.

The reports of the COVs are not readily available. Several participants urged easier access to COV reports or perhaps summaries of them that could be posted on the Web.

2.2. Communicating with Congress.

There was considerable discussion about how NSF could better describe to Congress how it judges good science. Legislators want more transparency, worrying that they are being asked to accept scientific judgments without knowing how those judgments work. However, communicating with Congress is difficult because many staffers are political appointees without scientific training. NSF should take the initiative by telling its story better, educating the staff where needed, and showing the value of basic research in ways that are useful for Congress.

2.3. The risk of self-serving reports.

Participants discussed a shift in the attitude of principal investigators caused by NSF changes, notably the “fast lane” application mechanism and the speed of electronic filing. In the past, principal investigators would spend as little time as possible on NSF reports. In the last year, that has changed, and people are trying to get their reports to NSF for the sake of greater visibility for their projects. This, in turn, brings the danger of self-serving reports and promotion of one's research agenda.

3. How is the result used in internal and external decision-making?

One consequence of complying with the GPRA performance plan has been to simplify NSF goals. Five broad agency goals have been whittled down to three:

  • Ideas (discoveries, learning, and innovation).
  • People (the workforce).
  • Tools (physical infrastructure for research and education).

The previous five goals placed insufficient emphasis on equipment and facilities, especially new information resources, and did not match well with the reporting requirements of GPRA.

3.1. Some internal benefits

  • The process helps the agency to focus on issues that need attention, such as (in FY1999) the fast-lane area and the use of merit review criteria.
  • It has helped to improve management efficiency and effectiveness. Specifically, it has helped to collect information, focus activities, sharpen the vision of the directorate, and see in a broad way what the agency is doing.
  • It puts more discipline into planning. Before, the agency had a “visionary plan,” but it was not very well connected to implementation. Now, NSF has to connect that strategic plan all the way down to program level and assessment.
  • It helps to increase accountability, and for this the help of the scientific community is needed. The NSF Accountability Report for FY1999 received the highest marks for a government agency.

3.2. A time-consuming process.

One important result, which stimulated considerable discussion, was the amount of time and effort the agency has devoted to compliance. The requirements for documentation are increasing. Principal investigators are being asked for more, COVs are asked to digest more, and advisory committees have substantially more to do, as does everyone at the directorate level.

NSF Focus Group Participant List November 21, 2000

Panel Members:

Enriqueta Bond

President

The Burroughs Wellcome Fund

Research Triangle Park, North Carolina

Alan Schriesheim

Director Emeritus

Argonne National Laboratory

Argonne, Illinois

Rudolph A. Marcus

Arthur Amos Noyes Professor of Chemistry

California Institute of Technology

Pasadena, California

Stuart A. Rice

Frank P. Hixon Distinguished Service Professor

James Franck Institute

The University of Chicago

Chicago, Illinois

Max D. Summers

Professor of Entomology

Texas A&M University

College Station, Texas

Bailus Walker, Jr.

Professor of Environmental and Occupational Medicine

Howard University

Washington, D.C.

Participants:

David Ellis

EHR Advisory Committee Chair-NSF

President and Director, Museum of Science

Boston, Massachusetts

Irwin Feller

SBE Advisory Committee Chair-NSF

Institute for Policy Research and Evaluation

Pennsylvania State University

University Park, Pennsylvania

Eric A. Fischer

Senior Specialist in Science and Technology

Congressional Research Service

Library of Congress

Washington, D.C.

Robert Hull (by conference call)

Department of Materials Science and Engineering

University of Virginia

Charlottesville, Virginia

Genevieve Knezo

Specialist, Science and Technology Policy

Congressional Research Service

Library of Congress

Washington, D.C.

Robin Nazzaro

Assistant Director

US General Accounting Office

Washington, D.C.

David Radzanowski

Program Examiner, Science and Space Programs Branch

Office of Management and Budget

Washington, D.C.

Diane Raynes

US General Accounting Office

Washington, D.C.

Jean'ne Shreeve

Department of Chemistry

University of Idaho

Moscow, Idaho

Jack Sandweiss

Donner Professor, Department of Physics

Yale University

New Haven, Connecticut

David Simpson

The IRIS Consortium

Washington, D.C.

Agency Representatives:

Robert Eisenstein

Assistant Director for Math and Physical Sciences

The National Science Foundation

Arlington, Virginia

Paul Herer

Senior Staff Associate, Office of Integrative Activities

The National Science Foundation

Arlington, Virginia

Lorretta Hopkins

Staff Associate

Office of Integrative Activities

The National Science Foundation

Arlington, Virginia

Nathaniel Pitts

Director, Office of Integrated Activities

The National Science Foundation

Arlington, Virginia

Martha Rubenstein

Budget Division Director

The National Science Foundation

Arlington, Virginia

Appendix C-5. SUMMARY OF THE DEPARTMENT OF ENERGY FOCUS GROUP

1. What methodology is used for evaluating research programs under GPRA?

1.1. Overview.

1.1.1. Management structure.

For management purposes, the Department of Energy (DOE), an $18.9 billion agency, is divided into four “business lines,” including science, national security, energy, and environment. Most S&E research is supported by the Office of Science, whose five sub-offices are budgeted at just over $3 billion for FY2001. The $1 billion Office of Basic Energy Sciences (BES), was the one most extensively discussed at the workshop.C4-1

About half the budget of the Office of Science (SC) is allocated to research and divided 60/40 between research at its laboratories and research at universities. BES alone funds about 3,500 grants in 250 colleges and universities throughout the United States. Large laboratories and user facilities (26 major facilities and 12 collaborative research centers) receive over 30% of the office's budget; smaller portions go to major construction projects (currently featuring the Spallation Neutron Source and a high-energy physics project to study neutrino oscillations), capital equipment, and program direction. The laboratories are shared by many users from academe, government, and industry. Most laboratories, such as Brookhaven National Laboratory on Long Island, are government-owned and contractor-operated (GOCO). The contractors may be universities, not-for-profit organizations, industries, or consortia.

1.1.2. A shortage of needed information.

The scientific offices within DOE have found it difficult to comply with GPRA. The agency as a whole lacks a good system for tracking data that it needs to report on all its activities. The agency attempted to rectify this situation through a substantial investment in a federal government R&D database, but the lesson of that experience was that the agency needed its own system.

DOE tried at first to use a systemwide framework that emphasized the agency's financial structure in the hope that it would be easy to reconcile with the budget. This financial overlap, however, did not accurately represent what the agency does, and it was divorced from actual planning. The linkages between this plan and what occurs at the program-manager level were weak, and the plan did not match well with GPRA. The General Accounting Office (GAO) was critical of the process, and administrators felt both external and internal pressure to change it.

1.1.3. A new planning model.

Planners knew that a new planning model would have to be flexible because each new administration formulates different policies. But GPRA requires uniformity and a clear linkage between the performance plan and the performance report. The model would have to be able to show how the actions of DOE administrators result in excellent science at universities.

As a result, SC is currently attempting to design a new strategic planning process to characterize the research it is doing and link its GPRA reports more logically to science.

1.1.4. “Science is different.”

The reason for this attempt is that scientific organizations are different from other agencies because scientific research is different from other activities. Therefore, strategic planning for science should also be different.

Through a literature survey and focus groups, the agency is trying to develop a more “holistic view of the pathways that lead to excellent science.” The goal is to describe the pathways that an organization should take to achieve excellent science.

The agency has been studying this subject for one year and has now described an environment that “fosters excellent research at the performing institution.” A suggested framework includes two dimensions (internal focus and integration, and external focus and differentiation) and four perspectives of effective organizations: human-resource development, internal support structures, innovation and cross-fertilization, and setting and achievement of relevant goals.

1.2. What level of unit is the focus of the evaluation?

The agency has had difficulties in setting an appropriate level of unit for evaluation and in finding relevant performance measures. The individual programs had no early basis for deciding what level of aggregation to use or how many measures to apply. Therefore, some program-level reports have been very detailed and others more general, depending on the approach of individual program directors.

1.2.1. Reviewing methods.

Below the level of programs (and of the GPRA performance reports), much of DOE's research budget is allocated to support individual, investigator-driven research projects in universities. These projects are evaluated individually by traditional peer review—that is, the same external, independent review system used by the National Science Foundation, the National Institutes of Health, and other agencies that support external research.

For research supported and overseen directly by the agency, the unit of evaluation is usually the laboratory or the program within the laboratory. These units have long-established means of evaluation through external and program reviews that have been maintained for GPRA.

Some subtlety is involved in evaluating large facilities during construction or operation. Most of them, such as Spallation Neutron Source, are “one-of-a-kind” projects whose very construction may involve cutting-edge science. Once they are operational, the “maintenance” expenses for such facilities may become difficult to distinguish from the “research” expenses for the purpose of GPRA.

The agency also measures its contribution to S&E human resources. The agency maintains a commitment to supporting graduate and postdoctoral education; despite budget losses in the laboratories, it has roughly maintained existing levels of grants to universities.

1.3. Who does the evaluation of the research program under GPRA?

1.3.1. Peer reviewers.

For the university grant program, virtually all individual projects are evaluated by regular peer review under the Office of Science's Merit Review System guidelines. This external process conforms to standard independent peer-review procedures.

For laboratory research programs and facilities (e.g., Argonne National Laboratory and the Princeton Plasma Physics Laboratory), a formal documentation system similar to peer review is the norm. For example, BES evaluates the research projects it funds according to procedures described in Merit Review Procedures for Basic Energy Sciences Projects at the Department of Energy Laboratories. These procedures are patterned after those given for the university grant program. Peer review at the laboratories is intended to provide an independent assessment of the scientific or technical merit of the research by peers who have “knowledge and expertise equal to that of the researchers whose work they review.”

1.3.2. Technical experts.

Major construction projects are evaluated by technical experts who look at relatively straightforward criteria, including cost, schedule, technical scope, and management (“Lehman reviews”). Reviews of major projects are typically held twice per year and may include 30-40 independent technical experts divided into six to eight subpanels.

1.3.3. Advisory committees.

For each of the five SC programs, the evaluation procedure also includes advisory committees. For example, the 26-member Basic Energy Sciences Advisory Committee (BESAC) meets two to four times per year to review the BES program, advise on long-range planning and priorities, and advise on appropriate levels of funding and other issues of concern to the agency. BESAC subcommittees focus on more specific topics, such as neutron-source upgrades and DOE synchrotron radiation sources. Users of BES facilities are surveyed annually and asked for quantitative information about publications, patents, Cooperative Research and Development Agreements, prizes and awards, and other achievements.

BESAC reviews do not feed directly into the GPRA process. The committee looks at peer reviews, contractor reviews, citation indexes, major awards, and any other relevant information; distills the information; and reports directly to the director of the Office of Science. The committee attempts to clarify why DOE is supporting particular programs and to gauge the contribution of individual facilities to the agency's research effort.

1.3.4. Dual reviews for GOCOs.

GOCOs are assessed by both DOE and the contractors. The agency does not rely on a contractor's review, because the contractor has an incentive to provide a favorable review to justify its compensation. Instead, the agency does annual “contractor appraisals” by using independent peer review. Ratings are given for

  • Research quality.
  • Relevance to mission.
  • Research facilities.
  • Research management.

Overall appraisals are “rolled up” from individual laboratory reviews for all programs. These contractor appraisals affect performance fees and contract renewals.

1.4. What criteria are used for the evaluation?

DOE's budget narrative system lists a summary of “budget guidance” items, beginning with program mission, program goal, and program objectives. DOE is attempting to conform GPRA's requirements with the budgetary requirements.

1.4.1. Separating excellence from relevance.

The new system departs from the intent of the three COSEPUP criteria, however, by yoking the first two, excellence and relevance. These measures should be separated. Some excellent research may not be relevant to the agency's mission, and some relevant research may not be of excellent quality.

1.4.2. 150 measures.

SC has been using more than 150 performance measures, which DOE representatives (and GAO) acknowledge is an unwieldy number. This system has not been helpful in assessments to date, partly because the measures are not specific enough, do not clarify the DOE role, do not include means of validation and verification, and do not have clear links to the DOE strategic plan and budget.

The agency's “emerging measures” are patterned more closely on the COSEPUP recommendations by including leadership. To measure the level of leadership, the agency is contemplating the use of the “virtual congress,” as suggested in the COSEPUP report.

1.4.3. Studying new criteria.

The new criteria for performance metrics—now being studied by a group led by Irwin Feller, of Pennsylvania State University—are being examined in the hope of allowing a response to GPRA that is “grounded in research.” The criteria will attempt to include the following elements:

  • Reasonable metrics (that is, reasonable for assessing a science agency).
  • Excellence in science management (a 3-year study that benchmarks best management practices was launched in January 2000).
  • Science “foresighting” (another 3-year study is examining science trends “out to 25 years”).
  • Portfolio analysis (using information-technology tools, including deep data-mining, to characterize the research portfolios of the Office of Science, the federal government, and the “international S&T research portfolio”).
  • Miscellaneous efforts (to apply organizational and management theory).
1.4.4. The need to take risks.

The Office of Science also uses the criterion of risk in evaluating its programs. Without taking risks in research, programs and projects are unlikely to achieve the high-reward payoffs of the best investigations. Missions need flexibility in choosing research directions because peer review by itself is inherently conservative.

1.5. How does the selection and evaluation of projects relate to the evaluation of the research program?

Participants discussed at some length the “charter” of DOE and how DOE managers decide to include or exclude various programs or research topics from this charter. This issue is important in assessing the relevance of research for GPRA.

1.5.1. Complexities of project selection.

The process of selecting projects is complex and combines information from the Office of Strategic Planning, input from advisory committees, and program decisions made internally. The users of DOE facilities come from many institutions, with many agendas, and DOE does not want to restrict the scope of research for those who are using the facilities in productive ways.

2. How is the result communicated to different audiences (e.g., S&T community, advisory committees, agency leadership, Administration, Congress)?

In its report to Congress on the usefulness of agency performance plans, GAO noted that SC's FY2000 plan was “moderately improved” over the FY1999 plan but still bore little relationship to budgeting. The agency felt that more improvement was needed and for the succeeding year attempted to follow the structure of the budget more closely. Therefore, it organized the performance goals by budget accounts and annotated the performance goals with linkages to the strategic plan by identifying the strategic objectives they support.

2.1. Meeting with oversight staff.

The agency also met with congressional staff and agreed to characterize its results by four categories: exceeded goal, met goal, nearly met goal, and below expectation. Each rank was based on deviation from the expectation established in the performance goal. This was done in response to GAO's concern that baselines and context had not been provided to compare with performance.

The agency has also added a section on verification and validation under each decision unit, including periodic guidance, reviews, certifications, and audits. Because of the size and diversity of the department's portfolio, verification is supported by extensive automated systems, external expert analysis, and management reviews.

2.2. Communicating about the new model.

There is considerable communication between DOE and GAO. After receiving a GAO report indicating that procedures for peer review vary among federal agencies, the House Science Committee asked GAO to investigate. GAO randomly sampled 100 BES research projects and concluded that the agency was performing merit review properly and following the established procedures.

3. How is the result used in internal and external decision-making?

3.1. GPRA results do not yet influence funding.

A common assumption about GPRA is that its results will be used to make funding decisions. However, many congressional staffs have not yet found ways to match performance results with funding decisions, because the process is still new and results are not often easily aligned with budgetary structure.

3.2. A critique of GPRA reports.

Performance metrics do little good unless they embrace the scientific effort as a whole. For example, metrics of construction projects say little about the value of the science that they are intended to support. It is important to use quality, relevance, and leadership as evaluation criteria; the agency should not try to review the whole portfolio every year.

Office of Science officials stated that they are suggesting a process very similar to this.

3.3. One result is DOE's new model.

Indeed, one result of DOE officials' attempts to evaluate their scientific research for GPRA has been to convince the agency of the desirability of the new assessment model that they are studying. The goals of the study are to

  • Investigate how funding agencies can foster excellent science.
  • Focus on the impacts of interactions among the Office of Science and science-performing organizations.
  • Identify relevant research in organizational effectiveness and science management.
  • Fill gaps in knowledge or public-sector issues in management of scientific research.
  • Formulate strategies for dealing with large changes in research and funding environments.

Preliminary results have been mentioned above, but much of the study remains to be accomplished.

The agency noted that its reviews do have results—that a poor review of the construction of the Spallation Neutron Source had resulted in substantial changes in senior management.

DOE Focus Group Participant List November 29, 2000

Panel Members:

Alan Schriesheim (Cochair)

Director Emeritus

Argonne National Laboratory

Argonne, Illinois

Morris Tanenbaum

Retired Vice Chairman and Chief Financial Officer, AT&T

Short Hills, New Jersey

Participants:

Eugene W. Bierly

Senior Scientist

American Geophysical Union

Washington, D.C.

Jack E. Crow

Director, National High Magnetic Field Laboratory

Florida State University

Tallahassee, Florida

Eric A. Fischer

Senior Specialist in Science and Technology

Congressional Research Service

Library of Congress

Washington, D.C.

Richard D. Hazeltine

Professor of Physics

Director, Institute for Fusion Studies

University of Texas at Austin

Austin, Texas

Michael Holland

Program Examiner

Office of Management and Budget

Washington, D.C.

Genevieve Knezo

Specialist, Science and Technology Policy

Congressional Research Service

Library of Congress

Washington, D.C.

Robin Nazzaro

Assistant Director

US General Accounting Office

Washington, D.C.

Fred Sissine

Specialist in Energy Policy

Congressional Research Service

Library of Congress

Washington, D.C.

David Trinkle

Program Examiner, Science and Space Programs Branch

Office of Management and Budget

Washington, D.C.

Agency Representatives:

Patricia Dehmer

Associate Director, Office of Basic Energy Sciences

US Department of Energy

Germantown, Maryland

William J. Valdez

Director of the Office of Planning and Analysis

US Department of Energy

Washington, D.C.

Footnotes

C1-1

Physics, chemistry, mathematics, computer sciences, electronics, materials science, mechanics, terrestrial sciences, ocean sciences, atmospheric and space science, biologic sciences, and cognitive and neural science.

C1-2

FY2000 federal funding of basic research by funding agency was allocated as follows: National Institutes of Health, 50%; National Science Foundation, 13%; Department of Energy, 12%; National Aeronautics and Space Administration, 13%; DoD, 6%; other, 6%.

C1-3

Green means that a program is “progressing satisfactorily toward goals”; yellow means “generally progressing satisfactorily, but some aspects of the program are proceeding more slowly than expected”; red means it is “doubtful that any of the goals will be attained.” These DTO ratings are described as semiquantitative metrics that reflect the opinions of independent experts.

C1-4

“Although the DOD model of the transition path from basic research (6.1) to applied research (6.2) to advanced development (6.3) implies a linear model, this is often honored more in the breach than the practice. The ‘push' of the linear process is augmented in DOD by a feedback process, whereby changing operational requirements and new results from multidisciplinary research continually keep the Basic Research Program on target.” DOD Basic Research Plan, 1999, p. I-5.

C1-5

DOD's mission, as defined in its strategic plan, begins as follows: “The mission of the Department of Defense is to support and defend the Constitution of the United States; to provide for the common defense of the nation, its citizens, and its allies, and to protect and advance US interests around the world.” In practice, this mission includes considerable breadth, especially in regard to its Performance Goal 2.2, which is to “transform US military forces for the future.” This goal calls for a continued focus on “new generations of defense technologies,” which provides the foundation for its extensive S&T program.

C1-6

In the words of the BRP (p. I-5), “Basic research supported by DOD is directed to maximizing the value that is likely to be created by the investment. Value in this context is represented by the enabling technologies that realize the operational concepts and mission goals of Joint Vision 2010.

C2-1

Typical topics of the science advances were “Enzyme Can Repair Alzheimer's Tangles,” “Pathways in the Brain That Control Food Intake,” and “Proteins as Genetic Material in Human Disease.”

C2-2

Typical topics of the science capsules were “The Brain's Capacity to Change,” “Understanding Cataract Formation,” and “Homocysteine: Another Kind of Heart Risk.”

C2-3

Typical topics of the stories of discovery were “Drug Exposed Children: What the Science Shows,” “Challenging Obesity,” and “Helping Couples Conceive.”

C2-4

Evaluating Federal Research Programs.

C3-1

This enterprise was reorganized from the Office of Life and Microgravity Science and Applications (OLMSA) in September 2000. OLMSA was part of the Human Exploration and Development of Space Enterprise.

C3-2

Evaluating Federal Research Programs.

C4-1

The other four science suboffices are in biologic and environmental research, high-energy and nuclear physics, fusion energy sciences, and advanced scientific computing research.

Copyright © 2001, National Academy of Sciences.
Bookshelf ID: NBK44125

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.3M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...