Extracting Deep Phenotypes for Chronic Kidney Disease Using Electronic Health Records

Duc Thanh Anh Luong; Dinh Tran; Wilson D Pace; Miriam Dickinson; Joseph Vassalotti; Jennifer Carroll; Matthew Withiam-Leitch; Min Yang; Nikhil Satchidanand; Elizabeth Staton; Linda S Kahn; Varun Chandola; Chester H Fox

doi:10.5334/egems.226

Extracting Deep Phenotypes for Chronic Kidney Disease Using Electronic Health Records

EGEMS (Wash DC). 2017 Jun 12;5(1):9. doi: 10.5334/egems.226.

Affiliations

¹ University at Buffalo.
² University of Colorado, Denver.
³ Icahn School of Medicine at Mount Sinai.

Abstract

Introduction: As chronic kidney disease (CKD) is among the most prevalent chronic diseases in the world with various rate of progression among patients, identifying its phenotypic subtypes is important for improving risk stratification and providing more targeted therapy and specific treatments for patients having different trajectories of the disease progression.

Problem definition and data: The rapid growth and adoption of electronic health records (EHR) technology has created a unique opportunity to leverage the abundant clinical data, available as EHRs, to find meaningful phenotypic subtypes for CKD. In this study, we focus on extracting disease severity profiles for CKD while accounting for other confounding factors.

Probabilistic subtyping model: We employ a probabilistic model to identify precise phenotypes from EHR data of patients who have chronic kidney disease. Using this model, patient's eGFR trajectory is decomposed as a combination of four different components including disease subtype effect, covariate effect, individual long-term effect and individual short-term effect.

Experimental results: The discovered disease subtypes obtained by Probabilistic Subtyping Model for CKD are presented and their clinical relevance is analyzed.

Discussion: Several clinical health markers that were found associated with disease subtypes are presented with suggestion for further investigation on their use as risk predictors. Several assumptions in the study are also clarified and discussed.

Conclusion: The large dataset of EHRs can be used to identify deep phenotypes retrospectively. Directions for further expansion of the model are also discussed.

Grants and funding

KL2 TR001413/TR/NCATS NIH HHS/United States