Use of natural language processing to improve predictive models for imaging utilization in children presenting to the emergency department

BMC Med Inform Decis Mak. 2019 Dec 30;19(1):287. doi: 10.1186/s12911-019-1006-6.

Abstract

Objective: To examine the association between the medical imaging utilization and information related to patients' socioeconomic, demographic and clinical factors during the patients' ED visits; and to develop predictive models using these associated factors including natural language elements to predict the medical imaging utilization at pediatric ED.

Methods: Pediatric patients' data from the 2012-2016 United States National Hospital Ambulatory Medical Care Survey was included to build the models to predict the use of imaging in children presenting to the ED. Multivariable logistic regression models were built with structured variables such as temperature, heart rate, age, and unstructured variables such as reason for visit, free text nursing notes and combined data available at triage. NLP techniques were used to extract information from the unstructured data.

Results: Of the 27,665 pediatric ED visits included in the study, 8394 (30.3%) received medical imaging in the ED, including 6922 (25.0%) who had an X-ray and 1367 (4.9%) who had a computed tomography (CT) scan. In the predictive model including only structured variables, the c-statistic was 0.71 (95% CI: 0.70-0.71) for any imaging use, 0.69 (95% CI: 0.68-0.70) for X-ray, and 0.77 (95% CI: 0.76-0.78) for CT. Models including only unstructured information had c-statistics of 0.81 (95% CI: 0.81-0.82) for any imaging use, 0.82 (95% CI: 0.82-0.83) for X-ray, and 0.85 (95% CI: 0.83-0.86) for CT scans. When both structured variables and free text variables were included, the c-statistics reached 0.82 (95% CI: 0.82-0.83) for any imaging use, 0.83 (95% CI: 0.83-0.84) for X-ray, and 0.87 (95% CI: 0.86-0.88) for CT.

Conclusions: Both CT and X-rays are commonly used in the pediatric ED with one third of the visits receiving at least one. Patients' socioeconomic, demographic and clinical factors presented at ED triage period were associated with the medical imaging utilization. Predictive models combining structured and unstructured variables available at triage performed better than models using structured or unstructured variables alone, suggesting the potential for use of NLP in determining resource utilization.

Keywords: Medical imaging utilization; Natural language processing; Pediatric emergency department; Predictive model.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Child
  • Child, Preschool
  • Emergency Service, Hospital* / statistics & numerical data
  • Female
  • Health Care Surveys
  • Humans
  • Infant
  • Logistic Models
  • Male
  • Natural Language Processing*
  • Patient Acceptance of Health Care / statistics & numerical data
  • Radiography / statistics & numerical data*
  • Socioeconomic Factors
  • Tomography, X-Ray Computed / statistics & numerical data*
  • Triage
  • United States