Send to

Choose Destination
J Biomed Inform. 2005 Oct;38(5):367-75. Epub 2005 Mar 26.

Discrimination and calibration of mortality risk prediction models in interventional cardiology.

Author information

Decision Systems Group, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.



Using a local percutaneous coronary intervention (PCI) data repository, we sought to compare the performance of a number of local and well-known mortality models with respect to discrimination and calibration.


Accurate risk prediction is important for a number of reasons including physician decision support, quality of care assessment, and patient education. Current evidence on the value of applying PCI risk models to individual cases drawn from a different population is controversial.


Data were collected from January 01, 2002 to September 30, 2004 on 5216 consecutive percutaneous coronary interventions at Brigham and Women's Hospital (Boston, MA). Logistic regression was used to create a local risk model for in-hospital mortality in these procedures, and a number of statistical methods were used to compare the discrimination and calibration of this new and old local risk models, as well as the Northern New England Cooperative Group, New York State (1992 and 1997), University of Michigan consortium, American College of Cardiology-National Cardiovascular Data Registry, and The Cleveland Clinic Foundation risk prediction models. Areas under the ROC (AUC) curves were used to evaluate discrimination, and the Hosmer-Lemeshow (HL) goodness-of-fit test and calibration curves assessed applicability of the models to individual cases.


Multivariate risk factors included in the newly constructed local model were: age, prior intervention, diabetes, unstable angina, salvage versus elective procedure, cardiogenic shock, acute myocardial infarction (AMI), and left anterior descending artery intervention. The area under the ROC curve (AUC) was 0.929 (SE=0.017), and the p value for the Hosmer-Lemeshow (HL) goodness-of-fit was 0.473. This indicates good discrimination and calibration. Bootstrap re-sampling indicated AUC stability. Evaluation of the external models showed an AUC range from 0.82 to 0.90 indicating good discrimination across all models, but poor calibration (HL p value < or = 0.0001).


Validation of AUC values across all models suggests that certain risk factors have remained important over the last decade. However, the lack of calibration suggests that small changes in patient populations and data collection methods quickly reduce the accuracy of patient level estimations over time. Possible solutions to this problem involve either recalibration of models using local data or development of new local models.

[Indexed for MEDLINE]
Free full text

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center