Format

Send to

Choose Destination
J Biomed Inform. 2019 Jul 29;97:103258. doi: 10.1016/j.jbi.2019.103258. [Epub ahead of print]

PheValuator: Development and evaluation of a phenotype algorithm evaluator.

Author information

1
Janssen Research & Development, 920 Route 202, Raritan, NJ 08869, USA; OHDSI Collaborators, Observational Health Data Sciences and Informatics (OHDSI), 622 West 168th Street, PH-20, New York, NY 10032, USA. Electronic address: jswerdel@its.jns.com.
2
OHDSI Collaborators, Observational Health Data Sciences and Informatics (OHDSI), 622 West 168th Street, PH-20, New York, NY 10032, USA; Columbia University, 622 West 168th Street, PH20, New York, NY 10032, USA.
3
Janssen Research & Development, 920 Route 202, Raritan, NJ 08869, USA; OHDSI Collaborators, Observational Health Data Sciences and Informatics (OHDSI), 622 West 168th Street, PH-20, New York, NY 10032, USA; Columbia University, 622 West 168th Street, PH20, New York, NY 10032, USA.

Abstract

BACKGROUND:

The primary approach for defining disease in observational healthcare databases is to construct phenotype algorithms (PAs), rule-based heuristics predicated on the presence, absence, and temporal logic of clinical observations. However, a complete evaluation of PAs, i.e., determining sensitivity, specificity, and positive predictive value (PPV), is rarely performed. In this study, we propose a tool (PheValuator) to efficiently estimate a complete PA evaluation.

METHODS:

We used 4 administrative claims datasets: OptumInsight's de-identified Clinformatics™ Datamart (Eden Prairie,MN); IBM MarketScan Multi-State Medicaid); IBM MarketScan Medicare Supplemental Beneficiaries; and IBM MarketScan Commercial Claims and Encounters from 2000 to 2017. Using PheValuator involves (1) creating a diagnostic predictive model for the phenotype, (2) applying the model to a large set of randomly selected subjects, and (3) comparing each subject's predicted probability for the phenotype to inclusion/exclusion in PAs. We used the predictions as a 'probabilistic gold standard' measure to classify positive/negative cases. We examined 4 phenotypes: myocardial infarction, cerebral infarction, chronic kidney disease, and atrial fibrillation. We examined several PAs for each phenotype including 1-time (1X) occurrence of the diagnosis code in the subject's record and 1-time occurrence of the diagnosis in an inpatient setting with the diagnosis code as the primary reason for admission (1X-IP-1stPos).

RESULTS:

Across phenotypes, the 1X PA showed the highest sensitivity/lowest PPV among all PAs. 1X-IP-1stPos yielded the highest PPV/lowest sensitivity. Specificity was very high across algorithms. We found similar results between algorithms across datasets.

CONCLUSION:

PheValuator appears to show promise as a tool to estimate PA performance characteristics.

KEYWORDS:

Diagnostic predictive modeling; Phenotype algorithms; Validation

PMID:
31369862
DOI:
10.1016/j.jbi.2019.103258

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center