Send to

Choose Destination
BMC Med Inform Decis Mak. 2019 Dec 30;19(1):287. doi: 10.1186/s12911-019-1006-6.

Use of natural language processing to improve predictive models for imaging utilization in children presenting to the emergency department.

Author information

Department of Systems, Populations and Leadership, University of Michigan School of Nursing, Ann Arbor, USA.
Department of Emergency Medicine, Mayo Clinic, Rochester, USA.
Department of Anatomy and Medical Imaging, University of Auckland, Auckland, New Zealand.
Oxford Centre for Clinical Magnetic Resonance Research, University of Oxford, Oxford, UK.
Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China.
Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, USA.
Department of Emergency Medicine, University of Michigan School of Medicine, Ann Arbor, USA.



To examine the association between the medical imaging utilization and information related to patients' socioeconomic, demographic and clinical factors during the patients' ED visits; and to develop predictive models using these associated factors including natural language elements to predict the medical imaging utilization at pediatric ED.


Pediatric patients' data from the 2012-2016 United States National Hospital Ambulatory Medical Care Survey was included to build the models to predict the use of imaging in children presenting to the ED. Multivariable logistic regression models were built with structured variables such as temperature, heart rate, age, and unstructured variables such as reason for visit, free text nursing notes and combined data available at triage. NLP techniques were used to extract information from the unstructured data.


Of the 27,665 pediatric ED visits included in the study, 8394 (30.3%) received medical imaging in the ED, including 6922 (25.0%) who had an X-ray and 1367 (4.9%) who had a computed tomography (CT) scan. In the predictive model including only structured variables, the c-statistic was 0.71 (95% CI: 0.70-0.71) for any imaging use, 0.69 (95% CI: 0.68-0.70) for X-ray, and 0.77 (95% CI: 0.76-0.78) for CT. Models including only unstructured information had c-statistics of 0.81 (95% CI: 0.81-0.82) for any imaging use, 0.82 (95% CI: 0.82-0.83) for X-ray, and 0.85 (95% CI: 0.83-0.86) for CT scans. When both structured variables and free text variables were included, the c-statistics reached 0.82 (95% CI: 0.82-0.83) for any imaging use, 0.83 (95% CI: 0.83-0.84) for X-ray, and 0.87 (95% CI: 0.86-0.88) for CT.


Both CT and X-rays are commonly used in the pediatric ED with one third of the visits receiving at least one. Patients' socioeconomic, demographic and clinical factors presented at ED triage period were associated with the medical imaging utilization. Predictive models combining structured and unstructured variables available at triage performed better than models using structured or unstructured variables alone, suggesting the potential for use of NLP in determining resource utilization.


Medical imaging utilization; Natural language processing; Pediatric emergency department; Predictive model

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center