Send to

Choose Destination
Stud Health Technol Inform. 2017;243:190-194.

Analysis of Annotated Data Models for Improving Data Quality.

Author information

IT for Clinical Research, Lübeck (ITCR-L), University of Lübeck, Germany.
Institute of Medical Informatics, University of Lübeck, Germany.


The public Medical Data Models (MDM) portal with more than 9.000 annotated forms from clinical trials and other sources provides many research opportunities for the medical informatics community. It is mainly used to address the problem of heterogeneity by searching, mediating, reusing, and assessing data models, e. g. the semi-interactive curation of core data records in a special domain. Furthermore, it can be used as a benchmark for evaluating algorithms that create, transform, annotate, and analyse structured patient data. Using CDISC ODM for syntactically representing all data models in the MDM portal, there are semi-automatically added UMLS CUIs at several ODM levels like ItemGroupDef, ItemDef, or CodeList item. This can improve the interpretability and processability of the received information, but only if the coded information is correct and reliable. This raises the question how to assure that semantically similar datasets are also processed and classified similarly. In this work, a (semi-)automatic approach to analyse and assess items, questions, and data elements in clinical studies is described. The approach uses a hybrid evaluation process to rate and propose semantic annotations for under-specified trial items. The evaluation algorithm operates with the commonly used NLM MetaMap to provide UMLS support and corpus-based proposal algorithms to link datasets from the provided CDISC ODM item pool.


CDISC ODM; Natural Language Processing; Semantic Interoperability; UMLS

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for IOS Press
Loading ...
Support Center