Assessing the similarity of surface linguistic features related to epilepsy across pediatric hospitals

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):866-70. doi: 10.1136/amiajnl-2013-002601. Epub 2014 Apr 1.

Abstract

Objective: The constant progress in computational linguistic methods provides amazing opportunities for discovering information in clinical text and enables the clinical scientist to explore novel approaches to care. However, these new approaches need evaluation. We describe an automated system to compare descriptions of epilepsy patients at three different organizations: Cincinnati Children's Hospital, the Children's Hospital Colorado, and the Children's Hospital of Philadelphia. To our knowledge, there have been no similar previous studies.

Materials and methods: In this work, a support vector machine (SVM)-based natural language processing (NLP) algorithm is trained to classify epilepsy progress notes as belonging to a patient with a specific type of epilepsy from a particular hospital. The same SVM is then used to classify notes from another hospital. Our null hypothesis is that an NLP algorithm cannot be trained using epilepsy-specific notes from one hospital and subsequently used to classify notes from another hospital better than a random baseline classifier. The hypothesis is tested using epilepsy progress notes from the three hospitals.

Results: We are able to reject the null hypothesis at the 95% level. It is also found that classification was improved by including notes from a second hospital in the SVM training sample.

Discussion and conclusion: With a reasonably uniform epilepsy vocabulary and an NLP-based algorithm able to use this uniformity to classify epilepsy progress notes across different hospitals, we can pursue automated comparisons of patient conditions, treatments, and diagnoses across different healthcare settings.

Keywords: Epilepsy; Linguistics; Multicenter; Support vector machines; Text classification.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Electronic Health Records*
  • Epilepsy*
  • Hospitals, Pediatric
  • Humans
  • Linguistics*
  • Natural Language Processing*
  • Support Vector Machine*
  • Terminology as Topic