Format

Send to

Choose Destination
J Am Med Inform Assoc. 2017 Apr 1;24(e1):e79-e86. doi: 10.1093/jamia/ocw109.

A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD).

Author information

1
School of Biomedical Informatics, The University of Texas Health Science Center at Houston.
2
Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee.
3
Department of Medicine, Vanderbilt University School of Medicine.
4
Department of Medicine, University of Alabama at Birmingham, Birmingham.

Abstract

Objective:

The goal of this study was to develop a practical framework for recognizing and disambiguating clinical abbreviations, thereby improving current clinical natural language processing (NLP) systems' capability to handle abbreviations in clinical narratives.

Methods:

We developed an open-source framework for clinical abbreviation recognition and disambiguation (CARD) that leverages our previously developed methods, including: (1) machine learning based approaches to recognize abbreviations from a clinical corpus, (2) clustering-based semiautomated methods to generate possible senses of abbreviations, and (3) profile-based word sense disambiguation methods for clinical abbreviations. We applied CARD to clinical corpora from Vanderbilt University Medical Center (VUMC) and generated 2 comprehensive sense inventories for abbreviations in discharge summaries and clinic visit notes. Furthermore, we developed a wrapper that integrates CARD with MetaMap, a widely used general clinical NLP system.

Results and Conclusion:

CARD detected 27 317 and 107 303 distinct abbreviations from discharge summaries and clinic visit notes, respectively. Two sense inventories were constructed for the 1000 most frequent abbreviations in these 2 corpora. Using the sense inventories created from discharge summaries, CARD achieved an F1 score of 0.755 for identifying and disambiguating all abbreviations in a corpus from the VUMC discharge summaries, which is superior to MetaMap and Apache's clinical Text Analysis Knowledge Extraction System (cTAKES). Using additional external corpora, we also demonstrated that the MetaMap-CARD wrapper improved MetaMap's performance in recognizing disorder entities in clinical notes. The CARD framework, 2 sense inventories, and the wrapper for MetaMap are publicly available at https://sbmi.uth.edu/ccb/resources/abbreviation.htm . We believe the CARD framework can be a valuable resource for improving abbreviation identification in clinical NLP systems.

KEYWORDS:

clinical abbreviation; clinical natural language processing; machine learning; sense clustering

PMID:
27539197
DOI:
10.1093/jamia/ocw109
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center