Format

Send to

Choose Destination
AMIA Annu Symp Proc. 2017 Feb 10;2016:1717-1726. eCollection 2016.

SMASH: A Data-driven Informatics Method to Assist Experts in Characterizing Semantic Heterogeneity among Data Elements.

Author information

1
Department of Biomedical Informatics, Columbia University, New York, NY; HIV Center for Clinical and Behavioral Studies, NY State Psychiatric Institute & Columbia University, New York, NY.
2
Department of Biomedical Informatics, Columbia University, New York, NY.
3
Department of Biomedical Informatics, Columbia University, New York, NY; New York-Presbyterian Hospital Value Institute, New York, NY.
4
HIV Center for Clinical and Behavioral Studies, NY State Psychiatric Institute & Columbia University, New York, NY.
5
Department of Biomedical Informatics, Columbia University, New York, NY; School of Nursing, Columbia University, New York, NY.

Abstract

Semantic heterogeneity (SH) is detrimental to data interoperability and integration in healthcare. Assessing SH is difficult, yet fundamental to addressing the problem. Using expert-based and data-driven methods we assessed SH among HIV-associated data elements (DEs). Using Clinicaltrials.gov, we identified and obtained eight data dictionaries, and created a DE inventory. We vectorized DEs by study, and developed a new method, String Metric-assisted Assessment of Semantic Heterogeneity (SMASH), to find DEs: similar in An and Bn, unique to An, and unique to Bn. An HIV expert assessed pairs for semantic equivalence. Heterogeneous DEs were either semantically-equivalent/syntactically-different (HIV-positive/HIV+/Seropositive), or syntactically-equivalent/semantically-different ("Partner" [sexual]/"Partner"[relationship]). Context of usage was considered. SMASH aided identification of SH. Of 1,175 DE from pairs, 1,048 (87%) were semantically heterogeneous and 127 (13%) were homogeneous. Most heterogeneous pairs (97%) were semantically-equivalent/syntactically-different. Expert-based and data-driven methods are complementary for assessing SH, especially among semantically-equivalent/syntactically-different DE. Similar expert-based/data-driven solutions are recommended for resolving SH.

PMID:
28269930
PMCID:
PMC5333258
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center