Format

Send to

Choose Destination
J Biomed Inform. 2019 Jun;94:103182. doi: 10.1016/j.jbi.2019.103182. Epub 2019 Apr 19.

Concept embedding to measure semantic relatedness for biomedical information ontologies.

Author information

1
Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea.
2
Milner Therapeutics Institute University of Cambridge, Cambridge CB2 1TN, UK.
3
Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea. Electronic address: dhlee@kaist.ac.kr.

Abstract

There have been many attempts to identify relationships among concepts corresponding to terms from biomedical information ontologies such as the Unified Medical Language System (UMLS). In particular, vector representation of such concepts using information from UMLS definition texts is widely used to measure the relatedness between two biological concepts. However, conventional relatedness measures have a limited range of applicable word coverage, which limits the performance of these models. In this paper, we propose a concept-embedding model of a UMLS semantic relatedness measure to overcome the limitations of earlier models. We obtained context texts of biological concepts that are not defined in UMLS by utilizing Wikipedia as an external knowledgebase. Concept vector representations were then derived from the context texts of the biological concepts. The degree of relatedness between two concepts was defined as the cosine similarity between corresponding concept vectors. As a result, we validated that our method provides higher coverage and better performance than the conventional method.

KEYWORDS:

Embedding; NLP; Paragraph vector; Similarity; UMLS; Wikipedia

PMID:
31009761
DOI:
10.1016/j.jbi.2019.103182
Free full text

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center