Format

Send to

Choose Destination
J Biotechnol. 2017 Nov 10;261:177-186. doi: 10.1016/j.jbiotec.2017.07.016. Epub 2017 Jul 23.

Terminology supported archiving and publication of environmental science data in PANGAEA.

Author information

1
PANGAEA Data Publisher for Earth & Environmental Science, MARUM Center for Marine Environmental Sciences, University of Bremen, P.O. Box 33 04 40, 28334 Bremen, Germany. Electronic address: mdiepenbroek@pangaea.de.
2
PANGAEA Data Publisher for Earth & Environmental Science, MARUM Center for Marine Environmental Sciences, University of Bremen, P.O. Box 33 04 40, 28334 Bremen, Germany. Electronic address: uschindler@pangaea.de.
3
PANGAEA Data Publisher for Earth & Environmental Science, MARUM Center for Marine Environmental Sciences, University of Bremen, P.O. Box 33 04 40, 28334 Bremen, Germany. Electronic address: rhuber@uni-bremen.de.
4
PANGAEA Data Publisher for Earth & Environmental Science, MARUM Center for Marine Environmental Sciences, University of Bremen, P.O. Box 33 04 40, 28334 Bremen, Germany. Electronic address: spesant@marum.de.
5
PANGAEA Data Publisher for Earth & Environmental Science, MARUM Center for Marine Environmental Sciences, University of Bremen, P.O. Box 33 04 40, 28334 Bremen, Germany. Electronic address: mstocker@marum.de.
6
PANGAEA Data Publisher for Earth & Environmental Science, MARUM Center for Marine Environmental Sciences, University of Bremen, P.O. Box 33 04 40, 28334 Bremen, Germany. Electronic address: jfelden@marum.de.
7
PANGAEA Data Publisher for Earth & Environmental Science, MARUM Center for Marine Environmental Sciences, University of Bremen, P.O. Box 33 04 40, 28334 Bremen, Germany. Electronic address: mbuss@marum.de.
8
PANGAEA Data Publisher for Earth & Environmental Science, MARUM Center for Marine Environmental Sciences, University of Bremen, P.O. Box 33 04 40, 28334 Bremen, Germany.

Abstract

Exemplified on the information system PANGAEA, we describe the application of terminologies for archiving and publishing environmental science data. A terminology catalogue (TC) was embedded into the system, with interfaces allowing to replicate and to manually work on terminologies. For data ingest and archiving, we show how the TC can improve structuring and harmonizing lineage and content descriptions of data sets. Key is the conceptualization of measurement and observation types (parameters) and methods, for which we have implemented a basic syntax and rule set. For data access and dissemination, we have improved findability of data through enrichment of metadata with TC terms. Semantic annotations, e.g. adding term concepts (including synonyms and hierarchies) or mapped terms of different terminologies, facilitate comprehensive data retrievals. The PANGAEA thesaurus of classifying terms, which is part of the TC is used as an umbrella vocabulary that links the various domains and allows drill downs and side drills with various facets. Furthermore, we describe how TC terms can be linked to nominal data values. This improves data harmonization and facilitates structural transformation of heterogeneous data sets to a common schema. Technical developments are complemented by work on the metadata content. Over the last 20 years, more than 100 new parameters have been defined on average per week. Recently, PANGAEA has increasingly been submitting new terms to various terminology services. Matching terms from terminology services with our parameter or method strings is supported programmatically. However, the process ultimately needs manual input by domain experts. The quality of terminology services is an additional limiting factor, and varies with respect to content, editorial, interoperability, and sustainability. Good quality terminology services are the building blocks for the conceptualization of parameters and methods. In our view, they are essential for data interoperability and arguably the most difficult hurdle for data integration. In summary, the application of terminologies has a mutual positive effect for terminology services and information systems such as PANGAEA. On both sides, the application of terminologies improves content, reliability and interoperability.

KEYWORDS:

Data findability; Data interoperability; Data publishing; Semantics; Terminologies

PMID:
28743591
DOI:
10.1016/j.jbiotec.2017.07.016
[Indexed for MEDLINE]
Free full text

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center