Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
AMIA Annu Symp Proc. 2011;2011:979-86. Epub 2011 Oct 22.

Automated non-alphanumeric symbol resolution in clinical texts.

Author information

  • 1Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA.

Abstract

Although clinical texts contain many symbols, relatively little attention has been given to symbol resolution by medical natural language processing (NLP) researchers. Interpreting the meaning of symbols may be viewed as a special case of Word Sense Disambiguation (WSD). One thousand instances of four common non-alphanumeric symbols ('+', '-', '/', and '#') were randomly extracted from a clinical document repository and annotated by experts. The symbols and their surrounding context, in addition to bag-of-Words (BoW), and heuristic rules were evaluated as features for the following classifiers: Naïve Bayes, Support Vector Machine, and Decision Tree, using 10-fold cross-validation. Accuracies for '+', '-', '/', and '#' were 80.11%, 80.22%, 90.44%, and 95.00% respectively, with Naïve Bayes. While symbol context contributed the most, BoW was also helpful for disambiguation of some symbols. Symbol disambiguation with supervised techniques can be implemented with reasonable accuracy as a module for medical NLP systems.

PMID:
22195157
[PubMed - indexed for MEDLINE]
PMCID:
PMC3243158
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for PubMed Central
    Loading ...
    Write to the Help Desk