A Computable Phenotype Improves Cohort Ascertainment in a Pediatric Pulmonary Hypertension Registry

J Pediatr. 2017 Sep:188:224-231.e5. doi: 10.1016/j.jpeds.2017.05.037. Epub 2017 Jun 16.

Abstract

Objectives: To compare registry and electronic health record (EHR) data mining approaches for cohort ascertainment in patients with pediatric pulmonary hypertension (PH) in an effort to overcome some of the limitations of registry enrollment alone in identifying patients with particular disease phenotypes.

Study design: This study was a single-center retrospective analysis of EHR and registry data at Boston Children's Hospital. The local Informatics for Integrating Biology and the Bedside (i2b2) data warehouse was queried for billing codes, prescriptions, and narrative data related to pediatric PH. Computable phenotype algorithms were developed by fitting penalized logistic regression models to a physician-annotated training set. Algorithms were applied to a candidate patient cohort, and performance was evaluated using a separate set of 136 records and 179 registry patients. We compared clinical and demographic characteristics of patients identified by computable phenotype and the registry.

Results: The computable phenotype had an area under the receiver operating characteristics curve of 90% (95% CI, 85%-95%), a positive predictive value of 85% (95% CI, 77%-93%), and identified 413 patients (an additional 231%) with pediatric PH who were not enrolled in the registry. Patients identified by the computable phenotype were clinically distinct from registry patients, with a greater prevalence of diagnoses related to perinatal distress and left heart disease.

Conclusions: Mining of EHRs using computable phenotypes identified a large cohort of patients not recruited using a classic registry. Fusion of EHR and registry data can improve cohort ascertainment for the study of rare diseases.

Trial registration: ClinicalTrials.gov: NCT02249923.

Keywords: bioinformatics; computer-based model; pediatrics; pulmonary hypertension; registry.

Publication types

  • Observational Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Child
  • Data Mining*
  • Electronic Health Records*
  • Humans
  • Hypertension, Pulmonary / diagnosis*
  • Hypertension, Pulmonary / epidemiology
  • Phenotype
  • Predictive Value of Tests
  • Registries*
  • Retrospective Studies
  • Sensitivity and Specificity
  • United States / epidemiology

Associated data

  • ClinicalTrials.gov/NCT02249923