Extracting Clinical Features From Dictated Ambulatory Consult Notes Using a Commercially Available Natural Language Processing Tool: Pilot, Retrospective, Cross-Sectional Validation Study

Jeremy Petch; Jane Batt; Joshua Murray; Muhammad Mamdani

doi:10.2196/12575

Extracting Clinical Features From Dictated Ambulatory Consult Notes Using a Commercially Available Natural Language Processing Tool: Pilot, Retrospective, Cross-Sectional Validation Study

JMIR Med Inform. 2019 Nov 1;7(4):e12575. doi: 10.2196/12575.

Authors

Jeremy Petch^#^{1

2}, Jane Batt^#^{3

4

5}, Joshua Murray^#^{6

7}, Muhammad Mamdani^#^{1

6

8

9}

Affiliations

¹ Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
² Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, ON, Canada.
³ Division of Respirology, Department of Medicine, University of Toronto, Toronto, ON, Canada.
⁴ Keenan Research Centre for Biomedical Science, St. Michael's Hospital, Toronto, ON, Canada.
⁵ Department of Medicine, St. Michael's Hospital, Toronto, ON, Canada.
⁶ Li Ka Shing Centre for Healthcare Analytics Research and Training, St. Michael's Hospital, Toronto, ON, Canada.
⁷ Department of Statistical Sciences, Faculty of Arts and Sciences, University of Toronto, Toronto, ON, Canada.
⁸ Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada.
⁹ Department of Medicine, Faculty of Medicine, University of Toronto, Toronto, ON, Canada.

^# Contributed equally.

PMID: 31682579
PMCID: PMC6913750
DOI: 10.2196/12575

Abstract

Background: The increasing adoption of electronic health records (EHRs) in clinical practice holds the promise of improving care and advancing research by serving as a rich source of data, but most EHRs allow clinicians to enter data in a text format without much structure. Natural language processing (NLP) may reduce reliance on manual abstraction of these text data by extracting clinical features directly from unstructured clinical digital text data and converting them into structured data.

Objective: This study aimed to assess the performance of a commercially available NLP tool for extracting clinical features from free-text consult notes.

Methods: We conducted a pilot, retrospective, cross-sectional study of the accuracy of NLP from dictated consult notes from our tuberculosis clinic with manual chart abstraction as the reference standard. Consult notes for 130 patients were extracted and processed using NLP. We extracted 15 clinical features from these consult notes and grouped them a priori into categories of simple, moderate, and complex for analysis.

Results: For the primary outcome of overall accuracy, NLP performed best for features classified as simple, achieving an overall accuracy of 96% (95% CI 94.3-97.6). Performance was slightly lower for features of moderate clinical and linguistic complexity at 93% (95% CI 91.1-94.4), and lowest for complex features at 91% (95% CI 87.3-93.1).

Conclusions: The findings of this study support the use of NLP for extracting clinical features from dictated consult notes in the setting of a tuberculosis clinic. Further research is needed to fully establish the validity of NLP for this and other purposes.

Keywords: electronic health record; natural language processing; tuberculosis.

©Jeremy Petch, Jane Batt, Joshua Murray, Muhammad Mamdani. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 01.11.2019.