Format

Send to

Choose Destination
See comment in PubMed Commons below
J Biomed Inform. 2016 Dec;64:320-332. doi: 10.1016/j.jbi.2016.10.020. Epub 2016 Nov 2.

Can multilinguality improve Biomedical Word Sense Disambiguation?

Author information

1
NLP & IR Group, Dpto. Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia (UNED), Madrid 28040, Spain. Electronic address: aduque@lsi.uned.es.
2
NLP & IR Group, Dpto. Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia (UNED), Madrid 28040, Spain. Electronic address: juaner@lsi.uned.es.
3
NLP & IR Group, Dpto. Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia (UNED), Madrid 28040, Spain. Electronic address: lurdes@lsi.uned.es.

Abstract

Ambiguity in the biomedical domain represents a major issue when performing Natural Language Processing tasks over the huge amount of available information in the field. For this reason, Word Sense Disambiguation is critical for achieving accurate systems able to tackle complex tasks such as information extraction, summarization or document classification. In this work we explore whether multilinguality can help to solve the problem of ambiguity, and the conditions required for a system to improve the results obtained by monolingual approaches. Also, we analyze the best ways to generate those useful multilingual resources, and study different languages and sources of knowledge. The proposed system, based on co-occurrence graphs containing biomedical concepts and textual information, is evaluated on a test dataset frequently used in biomedicine. We can conclude that multilingual resources are able to provide a clear improvement of more than 7% compared to monolingual approaches, for graphs built from a small number of documents. Also, empirical results show that automatically translated resources are a useful source of information for this particular task.

KEYWORDS:

Biomedical Word Sense Disambiguation; Graph-based systems; Multilinguality; Parallel and comparable corpora; Unified Medical Language System; Unsupervised systems

PMID:
27815227
DOI:
10.1016/j.jbi.2016.10.020
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Elsevier Science
    Loading ...
    Support Center