Format

Send to

Choose Destination
PeerJ. 2016 Aug 16;4:e2331. doi: 10.7717/peerj.2331. eCollection 2016.

The health care and life sciences community profile for dataset descriptions.

Author information

1
Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States of America.
2
Department of Computer Science, Heriot-Watt University, Edinburgh, United Kingdom.
3
Department of Radiation Oncology (MAASTRO), GROW- School for Oncology and Developmental Biology, MAASTRO Clinic, Maastricht, Netherlands.
4
Ontotext Corporation, Sofia, Bulgaria.
5
CSIRO, Australia.
6
The Donnelly Centre, University of Toronto, Toronto, Canada.
7
Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Geneve, Switzerland.
8
Carleton University, Canada.
9
CALIPHO group, SIB Swiss Institute of Bioinformatics, Geneve, Switzerland.
10
IO Informatics, Berkeley, CA, United States of America.
11
Oxford e-Research Centre, University of Oxford, Oxford, Oxfordshire, United Kingdom.
12
Elsevier Labs, Netherlands.
13
Department of Medical Informatics and Epidemiology, Oregon Health Sciences University, Portland, OR, United States of America.
14
Office of Medical Informatics and Epidemiology, Pharmaceuticals and Medical Devices Agency, Chiyoda-ku, Japan.
15
EMBL, European Bioinformatics Institute, Saffron Walden, United Kingdom.
16
Database Center for Life Science, Kashiwa, Japan.
17
Advanced Center for Computing and Communication, RIKEN, Wako-shi, Saitama, Japan.
18
Cerenode Inc., United States of America.
19
The Babraham Institute, Cambridge, United Kingdom.
20
Nationwide Children's Hospital, Columbus, OH, United States of America.
21
Institute for Systems Biology, Seattle, WA, United States of America.
22
Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America.
23
Department of Exact Sciences, VU University Amsterdam, Amsterdam, Netherlands.
#
Contributed equally

Abstract

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

KEYWORDS:

Data profiling; Dataset descriptions; FAIR data; Metadata; Provenance

Supplemental Content

Full text links

Icon for PeerJ, Inc. Icon for PubMed Central
Loading ...
Support Center