Send to

Choose Destination
Pharmacoepidemiol Drug Saf. 2019 Feb;28(2):264-268. doi: 10.1002/pds.4680. Epub 2018 Oct 30.

Inflation of type I error rates due to differential misclassification in EHR-derived outcomes: Empirical illustration using breast cancer recurrence.

Author information

Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
Kaiser Permanente Washington Health Research Institute, Kaiser Permanente Washington, Seattle, WA, USA.



Many outcomes derived from electronic health records (EHR) not only are imperfect but also may suffer from exposure-dependent differential misclassification due to variability in the quality and availability of EHR data across exposure groups. The objective of this study was to quantify the inflation of type I error rates that can result from differential outcome misclassification.


We used data on gold-standard and EHR-derived second breast cancers in a cohort of women with a prior breast cancer diagnosis from 1993 to 2006 enrolled in Kaiser Permanente Washington. We simulated an exposure that was independent of the true outcome status. A surrogate outcome was then simulated with varying sensitivity and specificity according to exposure status. We estimated the type I error rate for a test of association relating this exposure to the surrogate outcome, while varying outcome sensitivity and specificity in exposed individuals.


Type I error rates were substantially inflated above the nominal level (5%) for even modest departures from nondifferential misclassification. Holding sensitivity in exposed and unexposed groups at 85%, a difference in specificity of 10% between the exposed and unexposed (80% vs 90%) resulted in a 36% type I error rate. Type I error was inflated more by differential specificity than sensitivity.


Differential outcome misclassification may induce spurious findings. Researchers using EHR-derived outcomes should use misclassification-adjusted methods whenever possible or conduct sensitivity analyses to investigate the possibility of false-positive findings, especially for exposures that may be related to the accuracy of outcome ascertainment.


electronic health record; misclassification; outcome; pharmacoepidemiology; phenotype; validation

Supplemental Content

Full text links

Icon for Wiley Icon for PubMed Central
Loading ...
Support Center