Predicting dementia with routine care EMR data

Artif Intell Med. 2020 Jan:102:101771. doi: 10.1016/j.artmed.2019.101771. Epub 2019 Dec 5.

Abstract

Our aim is to develop a machine learning (ML) model that can predict dementia in a general patient population from multiple health care institutions one year and three years prior to the onset of the disease without any additional monitoring or screening. The purpose of the model is to automate the cost-effective, non-invasive, digital pre-screening of patients at risk for dementia. Towards this purpose, routine care data, which is widely available through Electronic Medical Record (EMR) systems is used as a data source. These data embody a rich knowledge and make related medical applications easy to deploy at scale in a cost-effective manner. Specifically, the model is trained by using structured and unstructured data from three EMR data sets: diagnosis, prescriptions, and medical notes. Each of these three data sets is used to construct an individual model along with a combined model which is derived by using all three data sets. Human-interpretable data processing and ML techniques are selected in order to facilitate adoption of the proposed model by health care providers from multiple institutions. The results show that the combined model is generalizable across multiple institutions and is able to predict dementia within one year of its onset with an accuracy of nearly 80% despite the fact that it was trained using routine care data. Moreover, the analysis of the models identified important predictors for dementia. Some of these predictors (e.g., age and hypertensive disorders) are already confirmed by the literature while others, especially the ones derived from the unstructured medical notes, require further clinical analysis.

Keywords: Dementia; EMR; Machine learning; Prediction; Random forest.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Age Factors
  • Aged
  • Aged, 80 and over
  • Cost-Benefit Analysis
  • Dementia / diagnosis*
  • Drug Prescriptions / statistics & numerical data
  • Electronic Health Records* / economics
  • Humans
  • Hypertension / complications
  • Machine Learning
  • Mass Screening
  • Middle Aged
  • Models, Theoretical
  • Neuropsychological Tests
  • Predictive Value of Tests
  • Reproducibility of Results
  • Risk Factors