The effect of disease-prevalence adjustments on the accuracy of a logistic prediction model

Med Decis Making. 1996 Apr-Jun;16(2):133-42. doi: 10.1177/0272989X9601600205.

Abstract

The accuracy of a logistic prediction model is degraded when it is transported to populations with outcome prevalences different from that of the population used to derive the model. The resultant errors can have major clinical implications. Accordingly, the authors developed a logistic prediction model with respect to the noninvasive diagnosis of coronary disease based on 1,824 patients who underwent exercise testing and coronary angiography, varied the prevalence of disease in various "test" populations by random sampling of the original "derivation" population, and determined the accuracy of the logistic prediction model before and after the application of a mathematical algorithm designed to adjust only for these differences in prevalence. The accuracy of each prediction model was quantified in terms of receiver operating characteristic (ROC) curve area (discrimination) and chi-square goodness-of-fit (calibration). As the prevalence of the test population diverged from the prevalence of the derivation population, discrimination improved (ROC-curve areas increased from 0.82 +/- 0.02 to 0.87 +/- 0.03; p < 0.05), and calibration deteriorated (chi-square goodness-of-fit statistics increased from 9 to 154; p < 0.05). Following adjustment of the logistic intercept for differences in prevalence, discrimination was unchanged and calibration improved (maximum chi-square goodness-of-fit fell from 154 to 16). When the adjusted algorithm was applied to three geographically remote populations with prevalences that differed from that of the derivation population, calibration improved 87%, while discrimination fell by 1%. Thus, prevalence differences produce statistically significant and potentially clinically important errors in the accuracy of logistic prediction models. These errors can potentially be mitigated by use of a relatively simple mathematical correction algorithm.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adult
  • Aged
  • Algorithms
  • Coronary Angiography / statistics & numerical data
  • Coronary Disease / diagnosis
  • Coronary Disease / epidemiology*
  • Cross-Sectional Studies
  • Exercise Test / statistics & numerical data
  • Female
  • Humans
  • Logistic Models*
  • Male
  • Mass Screening / statistics & numerical data*
  • Middle Aged
  • Predictive Value of Tests
  • ROC Curve
  • Regression Analysis
  • Sampling Studies
  • United States / epidemiology