A flexible data-driven comorbidity feature extraction framework

Comput Biol Med. 2016 Jun 1:73:165-72. doi: 10.1016/j.compbiomed.2016.04.014. Epub 2016 Apr 20.

Abstract

Disease and symptom diagnostic codes are a valuable resource for classifying and predicting patient outcomes. In this paper, we propose a novel methodology for utilizing disease diagnostic information in a predictive machine learning framework. Our methodology relies on a novel, clustering-based feature extraction framework using disease diagnostic information. To reduce the data dimensionality, we identify disease clusters using co-occurrence statistics. We optimize the number of generated clusters in the training set and then utilize these clusters as features to predict patient severity of condition and patient readmission risk. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million hospital discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 Congestive Heart Failure (CHF) patients and the UCI 130-US diabetes dataset that includes admissions from 69,980 diabetic patients. We compare our cluster-based feature set with the commonly used comorbidity frameworks including Charlson's index, Elixhauser's comorbidities and their variations. The proposed approach was shown to have significant gains between 10.7-22.1% in predictive accuracy for CHF severity of condition prediction and 4.65-5.75% in diabetes readmission prediction.

Keywords: Clustering; Comorbidity; Knowledge discovery; Prediction.

MeSH terms

  • Algorithms*
  • Data Mining / methods*
  • Databases, Factual*
  • Diabetes Mellitus / diagnosis*
  • Electronic Health Records*
  • Female
  • Humans
  • Male