Send to

Choose Destination
AMIA Annu Symp Proc. 2018 Dec 5;2018:740-749. eCollection 2018.

Scalable Electronic Phenotyping For Studying Patient Comorbidities.

Author information

Biomedical Informatics Training Program, Stanford University, Stanford, CA.
Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA.
Department of Biomedical Data Science, Stanford University, Stanford, CA.


Over 75 million Americans have multiple concurrent chronic conditions and medical decision making for these patients is mostly based on retrospective cohort studies. Current methods to generate cohorts of patients with comorbidities are neither scalable nor generalizable. We propose a supervised machine learning algorithm for learning comorbidity phenotypes without requiring manually created training sets. First, we generated myocardial infarction (MI) and type-2 diabetes (T2DM) patient cohorts using ICD9-based imperfectly labeled samples upon which LASSO logistic regression models were trained. Second, we assessed the effects of training sample size, inclusion of physician input, and inclusion of clinical text features on model performance. Using ICD9 codes as our labeling heuristic, we achieved comparable performance to models created using keywords as labeling heuristic. We found that expert input and higher training sample sizes could compensate for the lack of clinical text derived features. However, our best performing model included clinical text as features with a large training sample size.

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center