NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Institute of Medicine (US). Sharing Clinical Research Data: Workshop Summary. Washington (DC): National Academies Press (US); 2013 Mar 29.

Sharing Clinical Research Data: Workshop Summary.
Show detailsKey Messages Identified by Individual Speakers
- Data sharing can enhance understanding of the results of an individual clinical trial and enable the pooling of data from multiple trials to extend scientific discoveries beyond those derivable from any single study.
- The moral and ethical arguments for data sharing center on fulfilling obligations to research participants, minimizing safety risks, and honoring the nature of medical research as a public good.
- The practical and scientific arguments for data sharing include improving the accuracy of research, informing risk/benefit analysis of treatment options, strengthening collaborations, accelerating biomedical research, and restoring trust in the clinical research enterprise.
- A cultural shift has already begun as leaders in industry, academia, and regulatory agencies recognize the value in increased transparency and data sharing and are focusing on how—instead of why—data should be shared.
- Participant-level data are particularly useful when shared, but care must be taken to avoid drawing inaccurate conclusions from reanalysis of such data.
Clinical data come in a variety of formats (see Box 2-1), from the raw data collected in case report forms during trials to the coded data stored in computerized databases to the summary data made available through journals and registries like ClinicalTrials.gov. Data sharing can also occur at many levels. Several of the presenters at the workshop described these data-sharing continuums and discussed the benefits and risks of data sharing, based on the degree to which participant-level data are made available to researchers and the public.
In some trials, data are not even made available to individual researchers participating in a multicenter trial. Sometimes, data are released to researchers not associated with the study only if they show a genuine research interest in the question and a track record of research capability. In some cases, data are shared with everyone.
THE USES OF SHARED PARTICIPANT-LEVEL DATA
De-identified patient data have two major uses, observed Deborah Zarin, director of ClinicalTrials.gov at the National Library of Medicine. They can improve transparency, helping to understand the results of an individual clinical trial, including what happened to individuals in the trial, and they can be pooled to discover new things not identified in the individual trials.
Data Sharing to Enable Independent Reanalysis
Steven Goodman, associate dean for clinical and translational research and professor of medicine and health policy and research at the Stanford University School of Medicine, discussed the former use case in the context of ensuring that a study was correctly analyzed and interpreted. Independent reanalysis of data is the basis of reproducible research and can be an extremely difficult task. An example he mentioned was a study of childhood asthma that had 72 different study forms, 109 form revisions, and almost 300,000 records in the database. The original manuscript started with 73 tables and 9 figures and underwent 40 revisions. The published manuscript contained three tables and two figures. “How do we begin from this tiny little slice that we see to begin to work backward and figure out is what they did right?” he asked. While the top tier of journals may have methodologists who can begin to check the chain of scientific custody from protocol to conduct to data to analysis to results, other journals have to rely on peer reviewers to detect problems. The authors of published studies can put additional information on the Web in the form of supplementary material and appendixes, but in reality, checking the accuracy of the results for a study like this is extremely difficult.
In talking about the tools that are needed to ensure that published findings are based on sound data and analyses, Goodman referenced a paper titled “Reproducible Epidemiologic Research” that proposes a standard for reproducibility (Peng et al., 2006). The premise behind that paper is that independent replication of research findings is the fundamental mechanism by which scientific evidence accumulates to support a hypothesis. The authors, therefore, argue that datasets and software should be made available to allow other researchers to conduct their own analyses and verify the published results.
Peter Doshi, a postdoctoral fellow at the Johns Hopkins University School of Medicine, also discussed the application of shared data to credible assessment of clinical trial results. Doshi, however, argued for a broader view of what should be considered clinical trial data. He proposed that detailed records of measurements and analyses, as well as narratives—including descriptions of patient dispositions, study protocols, and even correspondence—are needed to evaluate the quality of published trial results.
Data Sharing for Discovery
Participant-level data from multiple trials also can be combined to learn more than can be derived from the results of a single trial. Elizabeth Loder, clinical epidemiology editor at BMJ, observed that although meta-analyses historically have been done using summary-level data, the number of meta-analyses of individual participant data has been growing substantially. Furthermore, meta-analyses done with individual patient data are typically more likely to be able to detect treatment effects that differ across subgroups than meta-analyses done with aggregate data (Riley et al., 2010). These subgroup effects are frequently of great interest to clinical investigators. As Loder said, drawing from the title of an essay by Stephen Jay Gould, “the median is not the message.”
THE RATIONALE FOR DATA SHARING
The arguments in favor of sharing can be divided into two broad and overlapping categories, Loder explained. The first category consists of moral and ethical arguments. These arguments point to the necessity of fulfilling obligations to research participants, minimizing known risks and potential harm from unnecessary exposure to previously tested interventions, and honoring the nature of medical research as a public good. Patients participate in clinical trials based at least in part on the understanding that their data may benefit others, and these benefits are more likely to occur if the data are widely available. Also, unpublished information might in some cases prevent the occurrence of adverse events (Chalmers, 2006). Data sharing may take different forms, from simply publishing the results of research to publicly sharing detailed patient-level datasets. Finally, taxpayers provide a large amount of money to support publicly funded research and expect to have access to the benefits of that research.
The second category consists of practical and scientific arguments. These include detecting and deterring selective or inaccurate reporting of research; enabling the replication of results and potential resolution of apparently conflicting results; informing risk/benefit analyses for treatment options; facilitating application of previously generated data to new study questions; accelerating research; enhancing collaboration; and building trust in the clinical research enterprise. Rob Califf, director of the Duke Translational Medicine Institute, professor of medicine, and vice chancellor for clinical and translational research at Duke University Medical Center, who also spoke during the first session, pointed to the need to resolve results that appear conflicting. Clinicians are not able to interpret conflicting clinical trials data based on looking at the data abstractly without any kind of expert synthesis of information. Only through replication can one sort out whether conflicting results are due to chance or true differences.
Califf went on to describe a “cycle of quality” that can generate evidence to inform patient care (see Figure 2-1). Clinical trials generate knowledge, which is then applied in clinical practice. The measurement of patient outcomes then leads both to clinical practice guidelines that define standard of care and to further clinical trials. At the core of the cycle is measurement and education, which in turn depend on access to data. Box 2-2 describes how this paradigm of cumulatively building and sharing datasets has worked to reduce deaths due to heart attacks by 40 percent.

FIGURE 2-1
A “cycle of quality” from discovery science to the measurement of outcomes can generate evidence to inform policy. SOURCE: Califf et al., 2007.
As an example of the kinds of advances that may be possible, Loder cited the case of a high school student who won $75,000 at the Intel International Science and Engineering Fair. The student cited searchable databases and free online science papers as the tools that allowed him to create his prize-winning entry. “How many collaborators are out there, who we cannot even imagine at this point, who might make use of the data?” said Loder.
Loder also called attention to the need to build trust in the clinical research enterprise. This trust is at “an all-time low,” she said, which is causing a crisis in recruitment for clinical trials (Williams et al., 2008). The lack of trust extends even to physicians, who tend to discount studies of superior methodological rigor when they perceive that the studies have been funded by industry (Kesselheim et al., 2012). “If doctors do not believe the evidence, what hope is there for evidence-based medicine?” Loder asked.
Sharing data may generate problems that cannot be anticipated today, but it will also generate unanticipated benefits. “We are engaged in one of the great struggles of human knowledge—the struggle to liberate clinical trial information and make sure it is put to its best and highest use now and in the future,” Loder concluded. “It is a thrill to be part of this historic meeting.”
Commitment to Open Science
Every day, many people face difficult questions about health care, observed Harlan Krumholz, Harold H. Hines, Jr., Professor of Medicine at the Yale University School of Medicine. They need all of the information that is relevant to the options they are considering. If data are missing, their ability to make informed decisions will be impaired. This is the central argument in favor of open science, Krumholz said.
Krumholz's experience has been that whenever data are shared, whether voluntarily or not, new and important things are learned. In particular, the release of participant-level data has generated vital new information about the risks and benefits of drugs and devices. In some cases, access to this information leads to conclusions that contrast with the prevailing knowledge and changes the use of a drug or device. In other cases, it provides “nuance and understanding.” For instance, Krumholz described a study (also described by Loder) which found that unreleased data are about as likely to strengthen evidence for the use of a product as to weaken such evidence (Hart et al., 2012). “What is important is that we support the idea that data are a social good and the best science takes place in the light,” he said.
Krumholz shared his vision of a future where data sharing is widely accepted as being in everyone's best interest and will be the cultural norm. “Data sharing [will be] an essential characteristic of being a good scientist and a good citizen,” he said. With the full release of data, companies would compete on the basis of science, not marketing. Academic researchers could get credit not only for the papers they publish, but for the knowledge generated from the databases they create.
Industry has the opportunity to demonstrate leadership, restore trust, and reclaim its position of integrity through meaningful actions to share data, Krumholz continued. “You have a meaningful motivation,” he said. “The [medical] profession has less trust in your science than in [National Institutes of Health]-sponsored studies and is less likely to act on the results of the trials you sponsored, not just the ones you conduct. The pharmaceutical and device industries no longer have the respect they once held.… The result is a situation that does a disservice to the public, the medical profession, and the vast majority of professionals in industry who have extraordinarily high integrity and are in that industry for the right reasons.”
Krumholz noted that an important cultural shift is already taking place. Some industry leaders have already taken steps to support data sharing and have contributed to major scientific advances as a result. For example, Medtronic's decision to release the company's data on a product that has nearly a billion dollars in annual sales was a powerful statement that the company was seeking the truth. The individuals who have made these decisions “realize that studies are only possible due to the generosity of people who consented to participate, and that we have an obligation to ensure that the efforts of those subjects contribute as much as possible to knowledge generation.” Such transparency will also be essential to ensure the continuing flow of individuals who are willing to participate in trials, Krumholz added.
In return for the privilege of selling a medical product to the public, industry bears a responsibility to ensure that all the data concerning the risks and benefits are available to everyone, said Krumholz. The current challenge is not to decide whether data should be released, but how to do so while being attentive to the needs and concerns of all stakeholders. In addition, the publication of summary results is not enough, according to Krumholz. Rather, individual patient-level data need to be broadly and freely available for investigators. “We need the protocols and case report forms. We need full sharing of the source data.… With the talent in this room, and with those listening on the webinar and those who are interested, I know solutions can be found. If we are committed to the path, we can figure out how to do it.”
CAUTIONS ON DATA SHARING
Jesse Berlin, vice president of epidemiology for Janssen Research & Development, LLC, provided a countervailing view by asking whether participant-level data are always needed. Complications can arise when the data are reexamined, he said. Decisions may have been made during a clinical trial that cannot be replicated. Published studies may not always incorporate the appropriate intent-to-treat analysis. Endpoints may be defined differently in different trials. Study designs, patient populations, and treatments can vary from trial to trial. As a result of these and other potential problems, such analyses can go “seriously wrong,” Berlin warned. “It is not just a matter of feeling more comfortable having the individual-level data. You can actually get wrong answers.”
Although there is a common belief that participant-level data can enable verification and reproduction of trial results, that premise is reliant on the trustworthiness of the shared data, warned Peter Doshi. Even participant-level data can lead investigators astray. For example, a computerized database of participant-level data may not reflect what is actually recorded on a case report form. In some cases, it may be necessary to look beyond what people typically consider data (i.e., numbers) into more narrative forms of documentation depending on the intended use of the shared data.
- The Benefits of Data Sharing - Sharing Clinical Research DataThe Benefits of Data Sharing - Sharing Clinical Research Data
Your browsing activity is empty.
Activity recording is turned off.
See more...