Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations

AMIA Annu Symp Proc. 2012:2012:1004-13. Epub 2012 Nov 3.

Abstract

Abbreviations are widely used in clinical notes and are often ambiguous. Word sense disambiguation (WSD) for clinical abbreviations therefore is a critical task for many clinical natural language processing (NLP) systems. Supervised machine learning based WSD methods are known for their high performance. However, it is time consuming and costly to construct annotated samples for supervised WSD approaches and sense frequency information is often ignored by these methods. In this study, we proposed a profile-based method that used dictated discharge summaries as an external source to automatically build sense profiles and applied them to disambiguate abbreviations in hospital admission notes via the vector space model. Our evaluation using a test set containing 2,386 annotated instances from 13 ambiguous abbreviations in admission notes showed that the profile-based method performed better than two baseline methods and achieved a best average precision of 0.792. Furthermore, we developed a strategy to combine sense frequency information estimated from a clustering analysis with the profile-based method. Our results showed that the combined approach largely improved the performance and achieved a highest precision of 0.875 on the same test set, indicating that integrating sense frequency information with local context is effective for clinical abbreviation disambiguation.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Abbreviations as Topic*
  • Electronic Health Records*
  • Humans
  • Natural Language Processing*
  • Pattern Recognition, Automated*