Enhancing early autism prediction based on electronic records using clinical narratives

J Biomed Inform. 2023 Aug:144:104390. doi: 10.1016/j.jbi.2023.104390. Epub 2023 May 12.

Abstract

Recent work has shown that predictive models can be applied to structured electronic health record (EHR) data to stratify autism likelihood from an early age (<1 year). Integrating clinical narratives (or notes) with structured data has been shown to improve prediction performance in other clinical applications, but the added predictive value of this information in early autism prediction has not yet been explored. In this study, we aimed to enhance the performance of early autism prediction by using both structured EHR data and clinical narratives. We built models based on structured data and clinical narratives separately, and then an ensemble model that integrated both sources of data. We assessed the predictive value of these models from Duke University Health System over a 14-year span to evaluate ensemble models predicting later autism diagnosis (by age 4 years) from data collected from ages 30 to 360 days. Our sample included 11,750 children above by age 3 years (385 meeting autism diagnostic criteria). The ensemble model for autism prediction showed superior performance and at age 30 days achieved 46.8% sensitivity (95% confidence interval, CI: 22.0%, 52.9%), 28.0% positive predictive value (PPV) at high (90%) specificity (CI: 2.0%, 33.1%), and AUC4 (with at least 4-year follow-up for controls) reaching 0.769 (CI: 0.715, 0.811). Prediction by 360 days achieved 44.5% sensitivity (CI: 23.6%, 62.9%), and 13.7% PPV at high (90%) specificity (CI: 9.6%, 18.9%), and AUC4 reaching 0.797 (CI: 0.746, 0.840). Results show that incorporating clinical narratives in early autism prediction achieved promising accuracy by age 30 days, outperforming models based on structured data only. Furthermore, findings suggest that additional features learned from clinician narratives might be hypothesis generating for understanding early development in autism.

Keywords: Autism; EHR data; Ensemble model; Language models; Unstructured data.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Autistic Disorder* / diagnosis
  • Child
  • Child, Preschool
  • Electronic Health Records*
  • Electronics
  • Humans
  • Infant
  • Narration
  • Predictive Value of Tests