Publication Details

Continuing progress in measuring the returns on research investments requires new metrics and models to analyze how the inputs to research are converted into both short-term outputs and long-term impacts. NSF Director Subra Suresh provided the context for this discussion in a lunchtime keynote address that described five themes guiding NSF’s investment decisions. Two separate sessions at the workshop included seven speakers who examined specific tools and approaches, from the creation of a science policy infrastructure at NSF to visual analytics that can probe data sets for unexpected findings.


Traditional measures of research outputs provide only a partial picture of the state of scientific research in the United States, said NSF Director Subra Suresh during his keynote address at the workshop. For example, if the percentage of scientific publications were extrapolated into the future based on the trends of the last few years, China’s percentage would surpass that of the United States in 2013 or 2014. Publications are only one metric, Suresh acknowledged, and their impact is a matter of debate, but “agencies like NSF are looking at the significance, or lack thereof, of these kinds of metrics.”

Taking a different metric, the United States led the world until 2000 in R and D expenditures as a fraction of GDP. But in that year three major competitors —Germany, Japan, and South Korea— surpassed the United States, and several smaller countries have done so since. Other countries, such as China and Singapore, are investing very heavily in science and engineering research.

With the increasing globalization of research, metrics of the United States’ competitive edge will inevitably change. But such changes raise the question, said Suresh, of “what kind of metrics do we put in place so that we can position ourselves most appropriately for the future?”

At the National Science Foundation, this question should be considered within the context of five broad themes that are guiding the agency. First, science has entered what Suresh called a “new era of observation.” Digital technologies make it possible to generate data at an unprecedented pace. These data, along with new computational tools, are creating both tremendous excitement and new problems. NSF is devoting considerable effort to the development of cyberinfrastructure that can take advantage of these opportunities and solve the problems. In particular, cyberinfrastructure provides new capabilities for assessment of research. For example, the agency is asking what kinds of capabilities it can put in place in situations where the research community uploads data and information automatically. Researchers already have many responsibilities, and NSF has to be careful not to impose unfunded mandates on the community, said Suresh. But cyberinfrastructure makes it possible to store, integrate, sort, extract, and permanently archive information. How can this information best be used while protecting the integrity and confidentiality of the scientific process, Suresh asked. How can NSF work with other federal agencies and with its counterparts around the world to use this information to move science and education forward?

A second important opportunity, according to Suresh, is to integrate data and ideas from the social sciences and from the natural sciences. As an example, Suresh described NSF-sponsored research that identified the potential economic benefits of auctioning off portions of the electromagnetic spectrum. The 2012 federal budget projected that such auctions are expected to yield approximately $28 billion over the next decade, with $10 billion of that being set aside budget deficit reduction. “That’s a tangible contribution to policy of social sciences research sponsored by NSF some 20 years ago,” Suresh said. The social sciences research being sponsored by NSF offers many similar opportunities to leverage natural sciences research. In the context of clean energy, for example, Suresh has been talking with officials at the Department of Energy on how social, behavioral, and economic research sponsored by NSF can contribute to research supported by the department.

A third opportunity is to expand research partnerships both within the United States and internationally and through people exchanges as well as virtually through digital technologies. As NSF lacks the capability to engage in multiple bilateral relationships with many countries, Suresh has been exploring how NSF can work with private foundations and with multilateral bodies such as the G20 countries to enhance international cooperation.

Suresh’s fourth theme was the need to continue investing in the development of human capital, especially the STEM workforce, not just for the United States but for the world. Since 1952, Suresh noted, NSF has funded 46,000 graduate research fellows. In 2010 it doubled the number of graduate fellows to 2,000 per year and kept the number at 2,000 in 2011. In addition, the stipend was increased from $10,500 to $12,000, and NSF’s goal is to sustain that level of support into the future. NSF’s’ initial graduate fellows would be well into retirement by now. How were their careers shaped by NSF’s support? Have the fellowships helped women and underrepresented minority groups over the past 58 years? What effect have career awards and young investigator awards had on researchers? New computer technologies could gather information to help answer some of these questions and shape human capital policies within the financial constraints expected in the future.”

A fifth theme was the need to measure the impacts of NSF funded research intelligently and over a long period of time. Although a good deal of the research NSF funds has purely scientific motivations, some of it has helped generate entirely new industries making significant contributions to the economy, Suresh observed. How can NSF help match the products of research with the needs of the marketplace without taking money away from fundamental research? How can the agency reconcile the short-term economic focus of the country and its elected leaders with the long-term benefits of basic research? How can NSF best articulate the benefits of basic research funding over the course of decades for the American public and the global society? Suresh suggested that a possible model could be the studies of higher education institutions’ contributions to the economy of the Boston area. He also cited the number of startup companies that have emerged in part from NSF-funded nanoscience and engineering centers. In addition, he recounted physicist Michael Faraday’s response to William Gladstone when asked about the practical value of electricity. Faraday replied, “One day, sir, you may tax it.”

Suresh concluded his remarks with an invitation to workshop participants to make suggestions to NSF on its policies and programs: What new kinds of programs need to be put in place to take advantage of current opportunities? Should NSF’s merit review process be changed to recognize truly transformative multidisciplinary research? Can NSF promote family-friendly policies that will enable women in much greater numbers to join STEM workforce? Such input “would be enormously helpful,” Suresh said.


In 2005, OSTP Director John Marburger observed at a AAAS policy forum that he found it very difficult to provide an evidence-based answer to the question, “How can the federal government optimize its investments in science?” An interagency working group under the title of Science of Science Policy came to a similar conclusion in 2008, noting that no solid theoretical and empirical basis exists for deciding the level or allocation of scientific investments.

Those observations, along with the establishment of the Science of Science and Innovation Policy (SciSIP) program at NSF, culminated in an initiative to build a data infrastructure that would help answer the questions posed by Marburger and the interagency group. SciSIP Director Julia Lane described this system, known as STAR Metrics, at the workshop.

The Motivation for STAR Metrics

The motivation behind the system is threefold, said Lane. First, a principle of good government is that officials should be able to document the results of government spending. Instead, she said, most agencies are unable to document what researchers are supported, let alone what are the results of their work. Second, agencies need to be responsive to stakeholders, and the Office of Management and Budget, Office of Science and Technology Policy, and Congress are all asking for data. Third, the utility of the data requires new analytical approaches and the use of cutting edge technologies. “Relying on manual and burdensome reporting simply doesn’t make sense.”

What is STAR Metrics?

STAR Metrics is a federal and university partnership to document the outcomes of science investments to the public. It is an OSTP initiative partnering with NIH, NSF, DOE, and EPA that is divided into two phases. Phase 1 involves establishing uniform, auditable, and standardized measures of the initial impact of ARRA and base budget science spending on job creation. Phase II calls for the collaborative development of measures of the impact of federal science investments on the creation and diffusion of scientific knowledge (through publications and citations), economic growth (through patents, start-ups, and other measures), workforce development (through student mobility and employment), and social outcomes such as health and the environment.

This represents what Lane termed a “sea change” from the current data infrastructure on public science. For 50 years, the science agencies have essentially been proposal processing and award administration factories, she said. They apply labor and capital to the receipt of proposals, the awarding of grants and contracts, and the management of their performance. The proposal or award is not a behavioral unit of analysis but an intervention. The behavioral unit of analysis is the individual scientist. There is a pressing need, said Lane, is to restructure the data system to “look at the human beings who are affected by science funding and try to explain their behavior.”

Nevertheless, observed Lane, it makes less and less sense to talk about the outcome of an individual award. Increasingly, the relevant unit of analysis is a cluster of researchers, a scientific field or subdiscipline, or an entire research agenda. In addition, principal investigators typically get funding from a stream of activities, so being able to identify the incremental impact of an individual award is extraordinarily difficult. This has implications for the structure of the data within the agencies. “You have to capture the activities of the scientists over their entire period of activity, not just the period of the award.” Finally, the outcomes of many awards occur long after the administration of the award. Unless this long-term benefit is measured, the impact of a scientific investment will be under-estimated.

Capturing Data

In the twenty-first century, almost all scientific activity occurs electronically, yet reporting of scientific activities is often still done manually. “Submitting data that are in PDF format that are unstructured and unsearchable means that you miss enormous amounts of what’s going on,” said Lane.

In phase I, the STAR Metrics program sought to capture who is being supported by scientific funding without burdening researchers. It did that by using the internal administrative records of researchers’ institutions to capture that information as it flows from one place to another. STAR Metrics receives 14 administrative data elements from awards, grants, human resources, or finance systems on a quarterly basis.

Phase I began with a pilot project at six institutions. Since then, 75 institutions have joined on a voluntary basis. The data need not be personally identifiable.

As an example of the information that can be generated in phase I, Lane cited data on full time equivalent (FTE) positions. The data yield quarterly reports on FTE jobs generated by ARRA, total FTE jobs and positions, FTE jobs generated through subawards and among vendors, and jobs generated through overhead payments. “For the first time, for each institution, we’re able to document how many people are supported,” Lane said. Faculty are only a small proportion - about 20 percent - of the FTEs that are supported. Support services, graduate students, postdoctoral fellows, undergraduate students, and others represent 80 percent of the supported positions. An FTE may represent several supported students. The data also make it possible to calculate the total number of individuals supported by research funding, along with the number of positions supported outside universities through vendor and subcontractor funding. “Not a single PI lifted a pen or typed a keyboard to enable us to pull this information, yet the information is very powerful and can be used to inform federal and state lawmakers.”

Future Plans

The next step in STAR Metrics’ development is to develop the main features of the phase II platform that will compile information from individual researchers, commercial publication databases, administrative data, and other sources to capture as much information about scientific activities as possible. Federal policymakers, agency officials, research institutions, and investigators “will have a common and coherent system of understanding what they’re doing and the impact of what they’re doing,” Lane said.


The media have been questioning the return on federal research investments, noted Stefano Bertuzzi from the Office of Science Policy Analysis in NIH’s Office of the Director. A 2008 article in Newsweek concluded that “judging by the only criterion that matters to patients and taxpayers— not how many interesting discoveries have been made, but how many treatments for disease the money has bought— the return on investment to the American taxpayer has been approximately as satisfying as the AIG bailout.” A more recent article in Nature entitled “What Science Is Really Worth” ran under the tagline, “Spending on science is one of the best ways to generate jobs and economic growth, say research advocates. But the evidence behind such claims is patchy.”

Building an Empirical Framework

Continuing the discussion of STAR Metrics, Bertuzzi described it as a way of combining and linking input measures with economic, scientific, and social outcomes. For example, when a new discovery or technology is licensed to a company, the license represents a return on research investments. STAR Msfrics would “unpack what is inside the black box of the licensing,” said Bertuzzi.

Bertuzzi demonstrated a prototype tool based on the discovery of drugs for rheumatoid disease. These are transformative drugs that can seem to bring people back from near death, and they generate billions of dollars in sales each year. Using information from STAR Metrics, it is possible to trace the developments that led to these drugs using the scientist as the unit of analysis.

The scientific story began with fundamental research on inflammation, which led to the discovery of tumor necrosis factor (TNF). Further research on molecular mechanisms involving TNF gave rise to several different drugs that work in different ways to reduce inflammation.

STAR Metrics data show the levels of public and private funding for this research as based on funding attributions in publications related to TNF. Funding began largely in the public sector at NIH and then decreased over time as private funding increased. The data also yield an interactive website that presents a timeline of milestone events that led to the approval of specific drugs. Clicking on an event in the timeline produces a list of the scientists involved in publishing key papers. Clicking on the paper pulls up a brief CV along with highlights of the discovery and funding sources. Further links connect scientists with patent databases and other information.

The links among scientists, discoveries, publications, patents, and other information form networks that allow the process of discovery to be visualized. Interactive websites make it possible to explore the network to uncover collaborations, institutional connections, linked events, and other aspects of innovation. “We will be able to collect, through federal-wide profiles, what the scientists themselves tell about their stories, their interests, and their discoveries,” said Bertuzzi. STAR Metrics will make it possible to “disentangle and unpack all the complexity of the network that eventually led to that particular discovery.” A potential practical application would be to look for the common features of successful discovery processes and then try to replicate them.


The outputs of research historically have been viewed as consisting of papers, patents and human resources, noted Ian Foster, Arthur Holly Compton Distinguished Service Professor and Chan Soon-Shiong Scholar at the University of Chicago. Papers document ideas, patents establish ownership rights, and human resources constitute people who are trained in ideas and in methods.

Today, said Foster, large amounts of human intellectual capital are being captured in other forms— especially as data and computer software. These resources also capture ideas and methods that can be transferred from one person to another. Such resources have been growing explosively. In 2001, according to an annual report from the journal Nucleic Acids Research on the number of publicly available, high-quality databases in molecular biology, there were 96 molecular biology databases. In 2010, there were 1,070, and in 2011 there were 1,330. Some of these databases have tens of millions of entries and billions of bytes of nucleic acid information. “Historically, we might have thought of people as conducting an experiment, writing it up, and putting the results into a paper which other people would read, build on, and perhaps cite in their publications. Clearly, consulting databases rather than the literature has become a primary means of accessing the work of other investigators.”

In addition, an expanding set of online services provide access to software. “Web services” is a term often used to refer to the software that is made available over the internet by standardized protocols. One registry lists 2,053 services provided by 148 providers. Some of these provide very simple functions, but others provide sophisticated computational capabilities to scientists who otherwise would not have access to them. Furthermore, many of these services are made freely available to others, often through large development and distribution communities. “Data and software are two types of resources that are becoming fundamental to how people do science, and they are being shared in ways that are very different than just a few years ago.”

New methods are needed for evaluating these resources, said Foster, including their impact on the research process as well as on downstream activities such as job creation, patenting, and the formation of companies. The fact that these resources are digital makes such evaluations somewhat easier, because accessing an electronic database or piece of software involves a digitally mediated transaction and can be logged and analyzed in the future. Collective analysis of these transactions, along with more conventional metrics, also can reveal the ways in which knowledge is integrated. For example, the My Experiment project seeks to make the sharing of computational procedures, data, and software as easy as sharing images on a social networking site. The site also makes it possible to share workflows and reports on how often they are used and for what purpose. “We can look not only at how people interact with people via publications but also how software interacts with data and data with software and people with software and data.”

The STAR Metrics program also seeks to capture research activities and outputs in the form of a distributed database. In that context, it becomes possible to automate many administrative tasks such as creating biosketches, progress reports, final reports, and tenure reviews.

In this and other ways, researchers derive tremendous value from such platforms, said Foster. Researchers are as interested as evaluators in the connections between different knowledge bases. A system that links all research outputs to all relevant research inputs would be invaluable to researchers who are trying to determine which pathways have not been explored and should be pursued, which research strategies are most useful, and how a particular research problem has been tackled in the past. “With luck we will find, as is often the case in science, that the very activity of observing something will change the activity that we are observing, and accelerate its process.”


Measuring the impact of research requires a long-term view, said Lynne Zucker, Professor of Sociology and Policy Studies at the University of California, Los Angeles. The short-term impact can be much smaller than the long-run impact. To see these long-term impacts, said Zucker, “ten years out is about the minimum, in my experience, from having done a lot of evaluations of programs for the University of California system and for the Advanced Technology Program and other programs.”

Many new ideas are embodied in those who conceive them. People have high amounts of tacit knowledge, and they can transmit this knowledge to others. People who have been doing the same kind of science often can absorb these ideas quickly, but in general the diffusion of ideas is slow. Teams that include what Zucker called “star scientists” have been located primarily in universities, but increasingly they occur in firms, too. “There’s a lot of basic science going on in industry,” said Zucker.

Biotechnology is an exemplar of a science-driven industry. Scientific breakthroughs led to hundreds of new firms. Consolidation occurred when scientific advances slowed, with some firms growing and others failing. However, the number of jobs continued to grow, so that people were absorbed into the successful companies. In the case of biotechnology, the growth and change were revolutionary enough that an entirely new industry was created.

Developing an infrastructure to collect data about knowledge flows into industry is a complicated process and has not been done well in most industries, according to Zucker. However, in biotechnology, a system known as Bioscan makes it possible to track the process of transferring knowledge from molecular biology into industry. Bioscan also shows that firms in which star scientists are involved have higher employment growth than others. “It’s a selection process— the top talent gets selected first,” said Zucker

A new model of a high-science firm emerged in biotechnology. Scientists were free to publish and were rewarded for it, both in salary and stock options. Firms had deep collaborations with university faculty, and rewards were closely tied to the firms’ outputs. Large incumbent firms learned to emulate this culture, and if they did not they had a tendency to fade and die.

More recently, many nanotechnology firms have been adopting the biotech model and are undergoing a similar process. Many startup and incumbent firms are competing, with roughly one in ten firms having star scientists involved in their firms. Nanotechnology is more geographically distributed in the United States than biotechnology. But where star nano‐scientists are active has been a key determinant of where and when new firms enter the field.

NSF funding for nanotechnology has had a large impact in the field, Zucker observed, contributing to large increases in published nanoscale articles and significant growth in nanoscale patenting.

The impacts of star scientists vary across S and T areas in proportion to technological opportunity, said Zucker. Some areas have had recent breakthroughs, and those areas are going to have more opportunities than areas where the science is more mature. But scientific fields also make their own opportunities, as when biotechnology firms have begun working in nanotechnology.

In general, said Zucker, federal investments appear to be important for impacts in all science and technology areas, but to test this idea she and her colleagues have been developing an integrated database with input from multiple sources. The resource is beginning to produce early results, and “the general answer so far is yes, with some variation, federal grants do make a big difference … for most science areas.”

The initial version of the resource, StarTechZD, is now available on the web (http://startechzd.net) and permits the tracking of knowledge, funding, and economic impacts. It can identify both organizations and particular scientists within and across databases. It also can separate organizational and individual efforts. Zucker called it a “quantum jump in the ability to analyze science and technology… It’s an extremely important tool.”


Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces, said John Stasko, Professor and Associate Chair of the School of Interactive Computing at the Georgia Institute of Technology. It combines automated analysis techniques with interactive visualizations for effective understanding, reasoning, and decision making on the basis of large and complex data sets. Another way to think of visual analytics, said Stasko, is that it combines interactive visualization, computational data analysis, and analytical reasoning. “Visualization is not about making pretty pictures,” he said. “It’s about helping people solve problems and gain insights from their data.”

Visualization is not appropriate for every problem. If someone is interested in how many people are employed in an area, a data mining algorithm can find the best fit. However, visualization is a powerful tool in exploratory data analysis scenarios, ”where someone drops a pile of data in your lap and says ‘Help me understand what’s there.’” These are scenarios were people typically do not know exactly which questions to ask.

Effective visualization tools both answer questions and raise questions. The interactive aspects of the data enable someone using to tool to essentially have a conversation with the data. “You explore one angle and a new question arises. It’s through the interaction where things happen.”

Some existing visualizations can be frustrating, Stasko admitted. For example, large network graphs such as maps of science do not necessarily convey clear conclusions. A map might show that mathematics is strongly related to computer science, but such an observation is not very interesting. Also, one visualization cannot necessarily show all of the variables that someone might want to represent. They present a static view of connectivity, clustering, or centrality, “but you want to go beyond that.”

Stasko cited several examples of effective interactive visualizations. The Social Action system uses social network analysis to measure the centrality of different nodes in the network, thus combining the algorithmic analysis of the data with interactive exploration. Another system called Jigsaw does document analysis of unstructured text. Through such processes as text mining and entity identification, it produces multiple interactive visualizations of the content of the documents for exploration. Finally, Stasko mentioned a system called Ploceus (named after a weaver bird that creates elaborate nests) that does network visualizations from tabular data. The system takes data from a spreadsheet, for example, and creates networks that allow the data to be explored.

Stasko concluded by saying that there are many different methods of data analysis and they are not mutually exclusive. The best kinds of data analysis combine statistical, automated computational, and visual exploratory methods, he said. From such explorations of data, where the questions are not necessarily defined beforehand, insightful discoveries can emerge.


Adam Jaffe, Dean of Arts and Sciences and Fred C. Hecht Professor in Economic at Brandeis University, commented on the importance of creating a comprehensive database that contains all research inputs and outputs. “It has been a long time in coming, and we’ve talked about it for a long time, but we are now at a point where we can glimpse that it may actually be happening.” The only thing that can protect science funding, he said, is demonstrating the long-term and diffuse but tremendously important impacts of science, “and that requires very extensive and complicated data.”

One way to build such a database will be to take advantage of automated data capture. Once the framework for the system has been created, huge amounts of data can be collected automatically by searching the web. Automated data capture will reduce the reporting obligations imposed on institutions and individuals. “The ARRA reporting requirements almost caused my office for research administration to implode,” said Jaffe. Universities are under stress because financial support from all sources is down while financial needs are up. “Everyone is overworked, and when you put these reporting requirements on top of that, it really is a significant issue that we need to worry about.”

Such a database would be greatly advanced by a unique identifier for each person who receives money from the federal government to conduct research. “This is absolutely crucial,” said Jaffe. “If we eventually fail to get to a system where each person is tagged with a unique identifier, this project will not succeed.” Real data have many ambiguities that need to be resolved, and a unique identifier would resolve many of them.

Evaluations also need to track the failures—the students who dropped out, the grant applications that were not funded, the projects that produced negative results. “You don’t know the return to the successful investments unless you can have some kind of ‘but for’ or counterfactual to compare what occurred when you funded it to what might have occurred otherwise.” Statistically, the best way to answer these questions is to have data in the system on other than successful outcomes.

Finally, Jaffe said, the data should extend beyond the biosciences. “I know NIH is the 800-pound funding gorilla, but there are other sciences and other industries out there.”

The indirect effects of research funding can be very difficult to track. Things like the accumulation of human capital or the spillover effects from research have very long lags and diffuse impacts. Data collection therefore needs to be broad-based and multidimensional. “What is so exciting about some of these projects is that we are beginning to see an infrastructure where all the different pieces can be connected together, where we can come to understand better how all these things work.”


During the discussion period, the panelists discussed several prominent issues associated with improving the accuracy of information in databases. Administrative data tend to contain many errors, which can reduce the value of analyses. Some disciplines have adopted systems in which researchers are asked to review and correct errors in, for example, listings of publications and citations. One approach would be to promote researchers’ retention of permanent e-mail addresses that could function both as identifiers and as a means of verifying information related to that person.

Julia Lane cautioned that a unique identifier for each researcher may not be practical and may not be essential. It may make more sense to think of investigators having multiple identifiers that are interoperable. Identification is a problem in many countries, not just the United States, and efforts both within and across nations are now reaching the point where progress can be made.

Spector suggested that databases need to leverage the federated transparency of the Web rather than creating specific systems for measuring the impacts of research. There are several ways of doing this. Crowd-sourcing can be “incredibly powerful” because many people, and particularly the younger generation, want to keep information up to date. Natural language processing can help improve accuracy by comparing information from many places on the Web. Finally, machine learning algorithms are powerful categorization mechanisms. “Don’t build custom systems,” Spector warned, “because they will be expensive [and] bureaucratic.”

In response to a question about how advances in data presentation and visualization can help policymakers better understand and use data, Stasko said that it is critical for the designers of such systems to understand the systems’ users and tasks. “What do you want to find out about the data, and how can visualizations help?” The answers to questions in areas such as patenting could change scientific practices and help set the research agenda. And visualization can help convey the complexity of the innovation ecosystem, with all its different and tangled components.

Director Suresh was asked about the “broader impacts” criterion that NSF uses to review proposals, with reference to the reauthorization of the America COMPETES Act calling on NSF to broaden these impacts to include such considerations as performance measures and partnerships. Suresh responded that the National Science Board has been investigating the broader impacts criterion. Researchers are understandably confused, he said, about how many of these considerations to incorporate into their research proposals, how much of the burden to place on the individual versus the department versus the school versus the institution, and how to consider such factors as economic impact and workforce development. “This is very much a work in progress.” A number of groups are working in parallel and in conversation with one another, he said, ideally leading to clarity rather than confusion on this issue.