• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of plntphysLink to Publisher's site
Plant Physiol. May 2004; 135(1): 4–9.
PMCID: PMC429325

To Give or Not to Give? That Is the Question

A Personal Account

While Hamlet experienced “suffering caused by the slings and arrows of outrageous fortune…,” many modern day scientists have similar feelings when asked to give and share published data and materials with other members of our community. Herein, we argue that being on the side of the fence that advocates sharing freely published data and materials is an acceptable practice in the scientific community, an opinion based on our scientific journey over the past 40 years.

One of us (A.T.) began as a plant physiologist when experimental plant systems were as many as the number of plant species, and the National Institutes of Health was calling on individual principal investigators to write three-page proposals to obtain funds for their favorite study. Plant physiology and biochemistry were the major disciplines of the time. Arabidopsis was known to very few of us, and genetics was practiced by the corn geneticists, a small group of scientists who freely exchanged their mutants published in their newsletter (Maize Genetics Cooperation, 1926–2004). Gene cloning and sequencing had not been discovered, and large community projects were unheard of. Rules regarding data sharing were not defined, and the prevailing mood was NOT TO GIVE. Even inside a laboratory or departmental corridor, the sharing of ideas and materials was limited. In the late 1970s, when cloning genes from peas, carrots, and avocadoes was possible, the plant physiologist found himself in the laboratory of R.W.D. at Stanford in the Department of Biochemistry (under the direction of Arthur Kornberg). Ideas, materials, protocols, space, and grants were shared among the approximately 100-member department, located in two corridors. It was a powerful and productive machine. Sharing was not restricted interdepartmentally but rather extended globally. Newly constructed cloning vectors, Escherichia coli strains, cDNA, and genomic libraries and detailed experimental protocols were sent to anyone around the globe that requested them, even before publication. It was in that atmosphere of giving where the plant physiologist began to appreciate the concept of sharing and giving for the advancement of the common good. The transition was gradual and now has come full circle to underscore the importance of sharing, giving, and producing tools to be used by others in the global scientific community.

The discussion on data and material sharing is formidable; however, it can be focused on two groups: (1) published materials and data by individual investigators, and (2) unpublished data produced by large scale genome and community projects.

What the Caveman Said

A wonderful piece of writing relevant to the subject of data sharing appeared 3 years ago in the Journal of Cell Science (Caveman, 2001). We thought that it would be better to quote it, rather than paraphrase or summarize it, because it speaks to all who have to decide whether to give or not to give. As you read it, you will realize that even the “Caveman” knew that giving was better than not giving:

As scientists we know the value of collaboration in our work. It is where ideas and expertise are shared, reagents are given away, and when young scientists learn to think about scientific advances being made as a group effort rather than on an individual basis. Collaboration, however, requires trust, respect and an eye to mentoring: trust that your ideas will not be stolen or misused; respect for the hard work and effort that brought you to the collaboration; and mentoring for the young scientists who want to develop in this area as independent scientists.

Such formal collaborations are fun, important and usually very productive—truly, they are one of the best parts of science culture. But what about the type of collaboration in which another laboratory doesn't want to interact with you on a long-term basis but simply wants a unique cDNA, antibody, cell line or transgenic mouse that your laboratory developed. In this case, the other laboratory may want the reagent for a line of research that is completely different from that going on in your laboratory, but then again they may want to do the same experiments for which you made the reagent. Do you send it to them immediately? Do you refuse the request, or at best ignore it? Do you lie that you have a small quantity of the reagent that is only sufficient for the work in your laboratory or that the reagent is no longer available (the freezer melt down of 98)? Do you inquire about the line of experimentation that they want to follow? Do you demand that sending the reagent is contingent upon a formal collaboration between you and the other group, or that you are a co-author on any publication that arises from the use of the reagent? Do you try and restrict the latitude or direction of the work that can be done with your reagent by the other group?

In my opinion, there is only one response to such a request. It is the first one above: that is, send the reagent immediately. Do not ask questions, do not demand a collaboration or co-authorship on papers, or restrict the work that will be done by the other group.

It is not easy to make this (right) choice of response. I know from first-hand experience. I have had individuals ask for specific reagents whose stated goal is to do the same experiments as we are performing with that reagent. Another laboratory wanted our reagents for a specific series of experiments that we were not planning to do; this group then turned around and used the reagents for the same line of experiments that we were following. I was asked to re-supply a reagent to a group that had not received the reagent from my laboratory directly; it turned out that a different group, who had received the reagent from us, had sent it to that group as part of their collaboration! In each case, I sent the reagent.

Yes, I am aware that postdocs and students made the requested reagent for their own work in your laboratory, and, therefore, the request may be so that another laboratory can do the same work (helping a competitor to compete against you?). That is why I always hope that the same conditions of a formal collaboration hold for the request for reagents, that is trust, respect and an eye to mentoring: trust that the receiving laboratory will use the reagent appropriately, acknowledge where they got it from, let you know how their experiments went and not give it away without permission; respect for the hard work and effort that went into the development of the reagent for research in the originator's laboratory (and not everyone else's); and an eye to mentoring so that postdocs and students who made the reagent have the time and scope to develop their research. I always hope that ‘respect’ and ‘an eye to mentoring’ will forestall other laboratories from asking for a reagent to compete directly with my laboratory in ongoing experiments for which the reagent was made. But, when it doesn't, I send the reagent anyway.

The Unified Concept UPSIDE of Sharing Publication-Related Data and Materials

Recently, the National Research Council of the National Academy of Sciences released a report by the Committee on Responsibilities of Authorship in the Biological Sciences, chaired by Tom Cech (Cech, 2003; Cozzarelli, 2004). The report, the full summary of which is published in an earlier article in this Editor's Choice Series (The National Academies Committee on Responsibilities of Authorship in the Biological Sciences, 2003), is a timely document that all of us and especially the junior investigators should read. The major conclusion of the Cech committee is the concept UPSIDE. It stands for the Uniform Principle for Sharing Integral Data and Materials Expeditiously. It states that:

Community standards for sharing publication-related data and materials should flow from the general principle that the publication of scientific information is intended to move science forward. More specifically, the act of publishing is a quid pro quo in which authors receive credit and acknowledgment in exchange for disclosure of their scientific findings. An author's obligation is not only to release data and materials to enable others to verify or replicate published findings (as journals already implicitly or explicitly require) but also to provide them in a form on which other scientists can build with further research. All members of the scientific community—whether working in academia, government, or a commercial enterprise—have equal responsibility for upholding community standards as participants in the publication system, and all should be equally able to derive benefits from it.

The Sixteen Commandments of Sharing Publication-Related Data and Materials

Four years ago Arthur Kornberg published the ten commandments of enzymology (Kornberg, 2000, 2003) as a guide to all of us as to how to pursue experimental biology. Similarly, the Cech committee recommends 16 instead of 10 commandments to all of us regarding the sharing of published information.

Thou Shalt…

  1. “Authors should include in their publications the data, algorithms, or other information that is central or integral to the publication—that is, whatever is necessary to support the major claims of the paper and would enable one skilled in the art to verify or replicate the claims.
  2. If central or integral information cannot be included in the publication for practical reasons (for example, because a dataset is too large), it should be made freely (without restriction on its use for research purposes and at no cost) and readily accessible through other means (for example, on-line). Moreover, when necessary to enable further research, integral information should be made available in a form that enables it to be manipulated, analyzed, and combined with other scientific data.
  3. If publicly accessible repositories for data have been agreed on by a community of researchers and are in general use, the relevant data should be deposited in one of these repositories by the time of publication.
  4. Authors of scientific publications should anticipate which materials integral to their publications are likely to be requested and should state in the ‘Materials and Methods’ section or elsewhere how to obtain them.
  5. If a material integral to a publication is patented, the provider of the material should make the material available under a license for research use.
  6. The scientific community should continue to be involved in crafting appropriate terms of any legislation that provides additional database protection.
  7. It is appropriate for scientific reviewers of a paper submitted for publication to help identify materials that are integral to the publication and likely to be requested by others and to point out cases in which authors need to provide additional instructions on obtaining them.
  8. It is not acceptable for the provider of a publication-related material to demand an exclusive license to commercialize a new substance that a recipient makes with the provider's material or to require collaboration or co-authorship of future publications.
  9. The merits of adopting a standard MTA should be examined closely by all institutions engaged in technology transfer, and efforts to streamline the process should be championed at the highest levels of universities, private research centers, and commercial enterprises.
  10. As a best practice, participants in the publication process should commit to a limit of 60 days to complete the negotiation of publication-related MTAs and transmit the requested materials or data.
  11. Scientific journals should clearly and prominently state (in the instructions for authors and on their Web sites) their policies for distribution of publication-related materials, data, and other information. Policies for sharing materials should include requirements for depositing materials in an appropriate repository. Policies for data sharing should include requirements for deposition of complex datasets in appropriate databases and for the sharing of software and algorithms integral to the findings being reported. The policies should also clearly state the consequences for authors who do not adhere to the policies and the procedure for registering complaints about noncompliance.
  12. Sponsors of research and research institutions should clearly and prominently state their policies for distribution of publication-related materials and data by their grant or contract recipients or employees.
  13. If an author does not comply with a request for data or materials in a reasonable time period (60 days) and the requestor has contacted the author to determine if extenuating circumstances (travel, sabbatical, or other reasons) may have caused the delay, it is acceptable for the requestor to contact the journal in which the paper was published. If that course of action is not successful in due course (another 30 days), the requestor may reasonably contact the author's university or other institution or the funding agency of the research in question for assistance. Those entities should have a policy and process in place for responding to such requests for assistance in obtaining publication-related data or materials.
  14. Funding organizations should provide the recipients of research grants and contracts with the financial resources needed to support dissemination of publication-related data and materials.
  15. Authors who have received data or materials of other investigators should acknowledge such contributions appropriately.
  16. Universal adherence, without exception, to a principle of full disclosure and unrestricted access to data and materials that are central or integral to published findings will promote cooperation and prevent divisiveness in the scientific community, maintain the value and prestige of publication, and promote the progress of science.”

According to the fourteenth commandment, it will be of great importance to establish of a federally funded resource center (Fig. 1) for accepting and distributing the materials derived from all the branches of plant biology, similar to that of the Arabidopsis Biological Resource Center (http://www.biosci.ohio-state.edu/~plantbio/Facilities/abrc/abrchome.htm). Such a center will facilitate the distribution of materials produced by a small laboratory with limited funds. Furthermore, we want to point out that it is unrealistic to demand that the 16 commandments should be followed religiously by foreign laboratories, since they may operate under a different system of laws and regulations.

Figure 1.
A potential scheme for sharing publication-related data and materials.

That the ninth commandment also needs to be amended was suggested by the Editor in Chief of the Proceedings of the National Academy of Sciences (Cozzarelli, 2004). This commandment requires that an MTA includes a “requirement that the material to be used for research purposes: that is where the primary intention of the research is the fundamental increase in knowledge.” We do not agree with this restriction because we believe that it would limit the sharing of materials to industry. Requiring a company not to make commercial use of scientific results is unreasonable.

Prepublication Data Release

During the last few years, a heated debate has been taking place among various communities regarding early prepublication data release of large-scale genome sequencing. The debate also extends beyond genomics; it encompasses any community resource-generating project to be subject itself to the same principles as those for genomic projects. The question is: Can a laboratory analyze and publish the findings of early released data deposited in a public database by a different laboratory? Conflicts have been reported over the early use of the sequences of the malaria parasites Plasmodium falciparum and Trypanosoma brucei (Macilwain, 2000). Similar conflict was reported over the use of the sequence data of the protozoan Giardia lablia (Marshall, 2002).

Since the initiation of the Human Genome Project, one of its operating principles has been that the data and resources generated should rapidly be made available to the scientific community. This implies the release of data prior to publication. In 1991, the National Human Genome Research Institute (NHGRI) and U.S. Department of Energy established a data release policy that called for release of data and materials no later than 6 months after having been generated (http://www.genome.gov/page.cfm?pageID=10000925). In 1996, the International Human Genome Sequencing Consortium adopted principles for data release (known as the Bermuda Principles; http://www.gene.ucl.ac.uk/hugo/bermuda.htm) that called for the automatic, rapid release of sequence assemblies of 1 to 2 kb or greater to the public databases. Subsequently, in April 1997, NHGRI published a data release policy stating that its grantees engaged in large-scale genomic DNA sequencing should release DNA sequence assemblies of >2 kb within 24 h of their generation. This policy become partially outmoded because it did not adequately address randomly generated whole-genome shotgun data. Such data sets are not assembled until late in a project, so tying data release to assembly could actually have had the opposite effect of slowing the release of sequence data.

Consequently, in December 2000, NHGRI extended its data release policy, calling for raw sequence traces to be submitted weekly to a public trace database. The institute, however, acknowledged that this early data release policy potentially threatened the standard scientific practice that those who generate primary data should have both the right and responsibility to publish the work in a peer-reviewed journal. To prevent such an event, the NHGRI agreed to the inclusion of a statement on the trace data that indicated that users could use the data for all purposes, with the sole exception of the initial publication of the complete genome sequence assembly or other large-scale analyses that the producers planned to publish.

This restriction attracted little attention until early 2002, when a community debate began about the merits of allowing any limitation on the use of whole-genome assemblies once they had been submitted to the public databases. To discuss the issue and attempt to resolve their differences, the Wellcome Trust (http://www.wellcome.ac.uk/) organized a meeting of data producers, users, database personnel, journal editors, and funding agency representatives in Fort Lauderdale, Florida, in January 2003.

It was unanimously agreed that prepublication release of large-scale genome sequence data has been of tremendous benefit to the scientific research community at large and that it is very important to ensure that such release of sequence data continues. They therefore reaffirmed the Bermuda principles and recommended that they be extended to all types of sequence data. Furthermore, they recognized that other large efforts, designated community resource projects, would increasingly be generating data and other resources that should also be rapidly released to the community in an unrestricted manner. To ensure the continuing effectiveness of the system of rapid, prepublication release of data from community resource projects, the meeting attendees concluded that each of the three stakeholders in the system—data producers, data users and funding agencies—has an active role to play in promulgating this tradition of openness.

In response to the Fort Lauderdale meeting, the NHGRI modified its data release policy to implement the system of tripartite responsibility by stating:

  • Large-insert clone-based projects: DNA sequence assemblies of 2 kb or greater are to be deposited in a public nucleotide sequence database within 24 h of generation. Sequence traces from these projects are to be deposited in a trace archive within 1 week of production;
  • Whole genome shotgun projects: Sequence traces from whole genome shotgun projects are to be deposited in a trace archive within 1 week of production. Whole genome assemblies are to be deposited in a public nucleotide sequence database as soon as possible after the assembled sequence has met a set of quality evaluation criteria.

The deposited data should be available for all to use without restriction.

It was pointed out by the NHGRI that the successful maintenance of the system of rapid, unrestricted, prepublication data release requires constructive behavior from both the sequence producers and users. The community depends on the success of these efforts, and the sequence producers typically face relatively little direct competition. Furthermore, it is not possible to guarantee them the standard scientific incentive of publishing the initial analysis of the data they generate without applying restrictions that might inhibit the broadest possible use of the data by the scientific community. Accordingly, the sequence producers must recognize that even if the sequence data are occasionally used in ways that violate normal standards of scientific etiquette, this is a necessary risk set against the considerable benefits of immediate data release.

Sequence users also must accept significant responsibilities. Users of unpublished genomic sequence data must appropriately acknowledge the source of the sequence data through the use of appropriate citations. Users must also recognize that the sequence producers have a legitimate interest in publishing peer-reviewed reports describing and analyzing the sequence that they have produced and that data deposits in databases are not the equivalent of such a publications. The entire scientific community can also help ensure that the system works fairly for all participants through the peer review systems of both journals and funding agencies.

NHGRI also encourages the entire scientific community to recognize that the continued success of the system of prepublication data release requires active community-wide support. There should be no restrictions on the use of the genomic sequence data, but the best interests of the community are served when all act responsibly to promote the highest standards of respect for the scientific contribution of others.

In addition, the NHGRI encourages the sequence producer to publish a Project Description, beginning with new genomic sequencing projects that are initiated in 2003. The purpose of the Project Description, which will be a new type of scientific publication, is to inform the scientific community about the sequencing project and to provide a citation that can be used to reference the source of the sequence.

A Two-Way Street: An Updated View of Sequence Data Release in the Post-Genomic Era

We have seen this issue from both sides of the fence. We sit on one side of the fence as members of the research community, from which we have been users of the sequence and have benefited from the availability of the genome sequence, and have strongly supported the quick and immediate release of the data. We sat on the other side of the fence in our role as contributors to the yeast (Dietrich et al., 1997) and Arabidopsis genome sequencing projects (Arabidopsis Genome Initiative, 2000) and constructing part of the Arabidopsis ORFeome (Yamada et al., 2003), for which we also advocated for immediate release of the sequences and ORFeome. However, from that same side of the fence, as principal investigators of several funded sequencing projects, we view that immediate release of sequence is not desirable for these types of projects.

The primary reason for coming to this conclusion is the following. If it is required to immediately release the sequence, we will be at a disadvantage relative to many others who will analyze our data because we have the added responsibilities of managing the sequencing and making the sequence available. Just getting the sequence and making it available (not to mention getting the grant to do it and dealing with the continuing administration of the grant) is a significant amount of work that our competition does not have to worry about. If the scientists leading projects to produce the sequence are to be able to attract the students and postdocs that are necessary to make the projects successful, they need to be able to offer them opportunities to exploit the data that are somewhat better than they would enjoy at other labs not producing the data.

The producers of scientific data have historically been awarded the courtesy of ownership of their data. This gives them the privilege of being the first to analyze it and the first to publish findings based on it. They can, in principle, sit on the data for as long as they want, but they do so at their peril, of course, because if they wait too long they may not get continued funding, or they may get scooped. We think most of us will agree that this system has worked more or less (but mostly more) efficiently over the years. It's not clear to us that DNA sequence data should be treated any differently. We realize that the large sequencing centers are in a somewhat unusual position of having an essential monopoly on data, and the issues this raises will need to be resolved, but we don't think it's in the best interest of anyone to require immediate release of the data.

The other consequence of a requirement for immediate data release that we are concerned about is that it will lead to the demise of the sequencing centers. Until now, they have been used largely as core facilities for the worldwide scientific community. We believe this is an unsustainable situation that, if continued, will destroy the large-scale sequencing centers. Whether that is an acceptable outcome might be a debatable issue, but if we think it is desirable to retain large-scale DNA sequencing centers, then we think the data release issues must be resolved without a wholesale requirement for immediate data release. This is because there would otherwise be no incentive for young scientists to get involved in such a project. An analogy may further clarify this point. Imagine a painter posting his unfinished artwork every 24 h outside his studio. One day another painter comes along and takes this painting, finishes it, and puts his name on it.

Finally, we think raising in this discussion the precedent of the Bermuda standards of data release is off the mark. We think the Bermuda standards are outdated and no longer apply to the current situation. They were relevant, and helpful, for navigating the data release waters of the past, but the situation has changed. It was important that the sequence data for the established (and widely used) model organisms and for the human be made available as soon as possible because there really were (and are) A LOT of people who could make immediate advances based on that data. However, it seems to us that in the post-genomic era, immediate sequence release is not so urgent because the data are not as critical to as many research advances as was the initial genome data.

Furthermore, we do not agree with the idea of putting restrictions on the use of data that has been publicly released. We know the problems with this firsthand because we have done the experiment of making publicly available our sequences to the public (via our Web pages) but with restrictions, and it caused problems. The simple solution to these problems is to revert back to the tried-and-mostly true method of allowing the producer of the data to own it until publication. Another possible solution is that these types of projects are carried out by commercial groups so that students are not at risk to be scooped. A commercial arrangement, however, will be more expensive than that carried out in an academic setting.

Thus, to give or not to give, that is the question. It is nobler in the action of scientists indeed TO GIVE.

Acknowledgments

We thank Sheila McCormick for making us aware of the article by the “Caveman,” Mark Johnston for sharing his thoughts regarding release of prepublication data, and Sarah Hake for useful discussions.

References

  • Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815. [PubMed]
  • Cech TR (2003) Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences. National Academies Press, Washington, DC, www.nap.edu/books/0309088593/html.
  • Caveman (2001) Send me all of your reagents and ideas. We want to work on the same experiments. J Cell Sci 114: 1037–1038. [PubMed]
  • Cozzarelli NR (2004) UPSIDE: Uniform Principle for Sharing Integral Data and Materials Expeditiously. Proc Natl Acad Sci USA 101: 3721–3722. [PMC free article] [PubMed]
  • Dietrich FS, Mulligan J, Hennessy K, Yelton MA, Allen E, Araujo R, Aviles E, Berno A, Brennan T, Carpenter J, et al. (1997) The nucleotide sequence of Saccharomyces cerevisiae chromosome V. Nature 387 (6632 suppl): 78–81. [PMC free article] [PubMed]
  • Kornberg A (2000) Ten commandments: Lessons from the enzymology of DNA replication. J Bacteriol 182: 3613–3618. [PMC free article] [PubMed]
  • Kornberg A (2003) Ten commandments of enzymology, amended. Trends Biochem Sci 28: 515–517. [PubMed]
  • Macilwain C (2000) Biologists challenge sequencers on parasite genome publication. Nature 405: 601–602. [PubMed]
  • Maize Genetics Cooperation (1926-2004) Maize Genetics Cooperation Newsletter, 1–78.
  • Marshall E (2002) DNA sequencer protests being scooped with his own data. Science 295: 1206–1207. [PubMed]
  • The National Academies Committee on Responsibilities of Authorship in the Biological Sciences (2003) Sharing publication-related data and materials: responsibilities of authorship in the life sciences. 132: 19–24. [PMC free article] [PubMed]
  • Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick AM, Wu HC, Kim C, Nguyen M, et al. (2003) Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302: 842–846. [PubMed]

Articles from Plant Physiology are provided here courtesy of American Society of Plant Biologists

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...