Format

Send to

Choose Destination
PLoS Biol. 2017 Jun 29;15(6):e2001414. doi: 10.1371/journal.pbio.2001414. eCollection 2017 Jun.

Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data.

Author information

1
Department of Medical Informatics and Epidemiology and OHSU Library, Oregon Health & Science University, Portland, Oregon, United States of America.
2
European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom.
3
ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom.
4
Berkeley Natural History Museums, University of California at Berkeley, Berkely, California, United States of America.
5
Institute of Data Science, Maastricht University, Maastricht, the Netherlands.
6
School of Computer Science, The University of Manchester, Manchester, United Kingdom.
7
Oxford e-Research Centre, University of Oxford, Oxford, United Kingdom.
8
Institute of Experimental Genetics, Helmholtz Centre Munich, German Research Center for Environmental Health, Neuherberg, Germany.
9
Center for Research in Biological Systems, University of California San Diego, La Jolla, California, United States of America.
10
Babraham Institute, Cambridge, United Kingdom.
11
European Molecular Biology Laboratory, Heidelberg, Germany.
12
Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark.
13
California Digital Library, Oakland, California, United States of America.
14
Science and Technology Facilities Council, Daresbury Laboratory, Warrington, United Kingdom.
15
Genomics Coordination Center, Department of Genetics, University Medical Center Groningen and Groningen Bioinformatics Center, University of Groningen, Groningen, the Netherlands.
16
Scientific Databases and Visualization at Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
17
Institute for Medical Informatics, Bern University of Applied Sciences, Engineering and Information Technology, Bern, Switzerland.
18
Manchester Institute of Biology, University of Manchester, Manchester, United Kingdom.
19
Department of Biochemistry, Stellenbosch University, Stellenbosch, South Africa.
20
Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals, University of Manchester, Manchester, United Kingdom.
21
Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America.
22
Leiden Institute of Advanced Computer Science, Leiden University, Leiden, the Netherlands.

Abstract

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

PMID:
28662064
PMCID:
PMC5490878
DOI:
10.1371/journal.pbio.2001414
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Public Library of Science Icon for PubMed Central
Loading ...
Support Center