Send to

Choose Destination
J Am Med Inform Assoc. 2018 Jan 1;25(1):25-31. doi: 10.1093/jamia/ocx101.

It's all in the timing: calibrating temporal penalties for biomedical data sharing.

Xia W1,2, Wan Z3,2, Yin Z3,4,2, Gaupp J1,2, Liu Y3,2, Clayton EW5,6,7,2, Kantarcioglu M8, Vorobeychik Y1,3,2, Malin BA1,3,4,2.

Author information

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
Center for Genetic Privacy and Identity in Community Settings, Vanderbilt University Medical Center, Nashville, TN, USA.
Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA.
Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA.
Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, Nashville, TN, USA.
Law School, Vanderbilt University, Nashville, TN, USA.
Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA.
Department of Computer Science, University of Texas at Dallas, Richardson, TX, USA.



Biomedical science is driven by datasets that are being accumulated at an unprecedented rate, with ever-growing volume and richness. There are various initiatives to make these datasets more widely available to recipients who sign Data Use Certificate agreements, whereby penalties are levied for violations. A particularly popular penalty is the temporary revocation, often for several months, of the recipient's data usage rights. This policy is based on the assumption that the value of biomedical research data depreciates significantly over time; however, no studies have been performed to substantiate this belief. This study investigates whether this assumption holds true and the data science policy implications.


This study tests the hypothesis that the value of data for scientific investigators, in terms of the impact of the publications based on the data, decreases over time. The hypothesis is tested formally through a mixed linear effects model using approximately 1200 publications between 2007 and 2013 that used datasets from the Database of Genotypes and Phenotypes, a data-sharing initiative of the National Institutes of Health.


The analysis shows that the impact factors for publications based on Database of Genotypes and Phenotypes datasets depreciate in a statistically significant manner. However, we further discover that the depreciation rate is slow, only ∼10% per year, on average.


The enduring value of data for subsequent studies implies that revoking usage for short periods of time may not sufficiently deter those who would violate Data Use Certificate agreements and that alternative penalty mechanisms may need to be invoked.


biomedical data science; data sharing; economics of data; genomics; policy

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center