NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Matheny M, McPheeters ML, Glasser A, et al. Systematic Review of Cardiovascular Disease Risk Assessment Tools [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2011 May. (Evidence Syntheses/Technology Assessments, No. 85.)

Cover of Systematic Review of Cardiovascular Disease Risk Assessment Tools

Systematic Review of Cardiovascular Disease Risk Assessment Tools [Internet].

Show details

Executive Summary

Introduction

Background

Cardiovascular disease (CVD) is the leading cause of death in the United States and costs the U.S. health care system an estimated $531 billion in direct and indirect costs.1, 2 Because of the high incidence and cost of this disease, clinical practice guidelines target primary prevention, and recommend that providers evaluate patients for cardiac risk factors that may warrant medical treatment.3-7 However, previous research has shown that providers do not accurately estimate the risk of CVD events on their own.8-13 A number of multivariate risk prediction equations, derived from large prospective cohort studies or randomized trials, have been developed to estimate CVD risk in time intervals ranging from 4 to 12 years.14-21 In order to make them more usable to busy clinicians, many of these risk models only require information from a patient's medical history and easily available laboratory tests, and have been adapted for interpretation through simplified charts or tables in paper or computer-based formats.3, 22

The most commonly used CVD risk prediction models in the United States are those based upon the Framingham cohort, a large prospective cohort of U.S. men and women aged 30 to 74 years. These models have been subsequently validated in multiple diverse populations.17, 20, 23-26 However, controversy remains regarding which variables are the most important for risk prediction, which outcomes are the most generalizable across populations, and whether remodeling or recalibration needs to be addressed in populations other than the source cohort.

A number of studies showing that patients with diabetes had significantly elevated risk for cardiovascular outcomes prompted the Adult Treatment Panel III (ATP-III) guidelines, which include a risk calculator that excludes patients with diabetes and direct clinicians to consider those patients as already having CVD for the purposes of medical management.3, 27, 28 However, other studies have questioned this assertion, both from risk modeling and disease management standpoints.29 In addition, there is a growing literature that suggests that patients with diabetes themselves are a heterogeneous group of patients who require diabetes-specific risk factors to adequately characterize their cardiovascular risk.23, 30

The aim of this systematic review was to summarize the current state of CVD risk models, with a focus on the U.S. patient population. In addition, performance of each of the available models in populations other than the source cohort was assessed, as well as a summarization of which models use which risk factors and the impact that recalibration and reclassification has had in the last few decades on these models. Finally, we sought evidence related to which models are best suited for predicting cardiovascular risk among patients with diabetes, and whether treating diabetes as an outcome equivalent is appropriate.

Key Questions

The key questions for this report were:

  • KQ1: Do any of the currently available tools for the prediction of cardiovascular risk in a North American population offer clear advantages in discriminatory power over the others in predicting incident coronary heart disease (CHD), cerebrovascular stroke (stratified by thrombotic or hemorrhagic type), or a combination of these two?
  • KQ2a: Do tools that treat diabetes as a CHD outcome equivalent have different performance characteristics than those that use diabetes as an independent risk factor for those outcomes?
  • KQ2b: Is the appropriateness of using diabetes as a coronary risk equivalent modified by the number of other cardiac risk factors that the individual has?

Methods

Literature Search

For this review, we included studies of asymptomatic adults in any setting and country with any study design in which a clinical risk prediction model was developed or validated for predicting CVD risk. We excluded studies that 1) were not published in English; 2) did not report information pertinent to the key questions; 3) had fewer than 200 participants at enrollment; 4) were not original studies; and 5) did not perform any internal or external validation of the model. For this review, the relevant population was men and women who are currently asymptomatic for CVD. As we developed each of the search components with input from previous systematic reviews, we employed an approach of iterative refinement, using a pool of approximately 50 relevant articles previously identified as a quasi-validation set, to assess recall of our search iterations (i.e., whether our searches retrieved or missed known items of interest).31-33 In addition to studies identified through the literature search in MEDLINE, we hand-searched the reference lists of all included articles for additional articles. Once we identified articles through the electronic database searches, review articles, and bibliographies, we examined abstracts of articles to determine whether studies met our criteria. Two reviewers separately evaluated the abstracts for inclusion or exclusion. If one reviewer concluded that the article could be eligible for the review based on the abstract, we retained it. Of the entire group of 3,499 articles, 636 required full-text review. For the full article review, two reviewers read each article and decided whether it met our inclusion criteria.

Data Abstraction

The data for this project were abstracted into a database designed to capture study information such as cohort characteristics, risk model characteristics, model performance statistics, and quality review elements. We collected information about the study populations to allow for stratification of results by variables, including sex and geographic area.

The team was trained to abstract by pulling relevant data from several articles into the database and then reconvening as a group to discuss the utility of the table design. We repeated this process through several iterations. The content lead reviewed each abstraction to ensure accuracy and completeness.

In addition to assessing the studies and models presented in the literature, we searched for all available tools online and documented their location and the model on which they purported to be based. We then used the online tools to calculate risk for five test cases, in order to identify any variation in estimated risk.

We assessed the quality of individual studies across multiple dimensions using assessment questions developed to reflect the importance of fully characterizing a population in which a model is developed, and the prevalence of missing data and loss to follow-up. In addition, evaluation methods and measures were pursued. We did not assign quality scores to the individual studies or the literature as a whole, but instead chose to present patterns of quality.

Results

Key Question 1

The literature search identified 3,499 potentially relevant articles of primary CVD risk modeling development or validation. Most of the studies were excluded in the abstract stage because either the study was not relevant to the topic or the study population was not asymptomatic for CVD. In the full-text review stage, most of the studies were excluded, either because the evaluation did not involve a risk prediction tool, the study population was not asymptomatic for CVD, or there were no model performance measurements reported.

A total of 84 articles29, 34-82,14, 19, 23-25, 83-110,111 were included in this review, representing a total of 102 risk prediction models. To develop the models, the authors used a total of 100 variations of 73 identifiable patient cohorts. These cohorts provided data on CHD outcomes (52 cohorts), CVD outcomes (31 cohorts), and cardiovascular accident (CVA) outcomes (12 cohorts). Of the 102 models that were identified, only 17 were externally validated in a population other than the one in which the model was developed, and those models were all developed from the following nine primary patient cohorts:

  • Scottish Heart Health Extended Cohort (SHHEC)
  • Diabetes Audit and Research in Tayside, Scotland (DARTS)
  • FINRISK
  • Framingham Study (FRS)
  • Framingham Offspring Study (FRS-O)
  • Prospective Cardiovascular Münster Study (PROCAM)
  • QRESEARCH Database
  • Systematic Coronary Risk Evaluation (SCORE)
  • United Kingdom Prospective Diabetes Study (UKPDS)

Information on these cohorts is available in Appendix G/Summary Table 4. The most commonly externally validated risk models were:

  • 1991 FRS model for CVD (with 26 evaluations)
  • 1998 FRS model for total CHD (with 24 evaluations)
  • FRS ATP-III model for hard CHD (i.e., sudden CHD death or myocardial infarction, with or without cardiac procedures) (with 16 evaluations)
  • PROCAM model for hard CHD (with 11 evaluations)
  • SCORE model for CVD mortality (with 11 evaluations)

These models are typically considered general population, first-outcome incidence calculators, meaning that they are intended to calculate individual risk for any patient within a certain age range. However, it is important to acknowledge that the FRS ATP-III model excludes patients with diabetes, the PROCAM model excludes women, the DARTS and UKPDS models exclude patients without diabetes, and the Scottish ASSIGN model (derived from SHHEC) includes a non-traditional social deprivation index as a risk factor. Therefore, it is possible that they are not entirely applicable in all general populations.

The majority of models (87 out of 102) identified through our search were not validated in an external data set.14, 24, 25, 34-39, 41, 47, 49-52, 55, 57, 60-62, 71, 75, 76, 80, 81, 83, 84, 87, 91, 93, 100, 101, 103, 107, 109, 111-115 Some of the studies published models yet to be externally validated that were directly intended to be used for individual risk prediction.54, 59, 60 Some of these models were developed for specific groups of patients with atrial fibrillation,52 chronic kidney disease,45 renal transplants,82 younger age,49 or older age,64, 69, 93, 102 or were based only on patient-provided information.51, 105 Other studies were conducted primarily in order to evaluate whether a variety of non-traditional risk factors improved prediction performance. These non-traditional risk factors include body mass index,34 hemoglobin A1c,36, 54 coronary calcification,40, 70, 87, 100, 104, 108, 116 echocardiography characteristics,53, 60 C-reactive protein,54, 60, 70, 86, 103 apolipoproteins,54, 80 socioeconomic factors,59, 66 family history,59, 76, 80, 103 carotid ultrasonography,72 metabolic syndrome,65, 75, 79 exercise testing parameters,84 and genetic polymorphisms.101 A recent review of non-traditional risk factors in CHD risk prediction concluded that the evidence was insufficient to assess the balance of benefits and harms of using these risk factors in risk prediction.117, 118

There was significant heterogeneity among outcome definitions, both across all of the studies and among models used for comparison within individual studies. Frequently, cohort outcome data were collected in order to match a particular risk model, but other models with different outcomes were used as comparisons. Nonetheless, since all of the outcomes were variations of CVD, stable relative risk performance was frequently found even when outcomes were mismatched.

Evaluating absolute risk prediction of a risk model with a mismatched outcome between model and cohort has severe limitations, because the baseline outcome event rates are different from the outset. Some interpretation is possible if the prediction error is in the opposite direction of what one would expect; that is, if a cohort outcome is more restrictive, one would expect the model to overpredict the outcome, but if it underpredicts the outcome, then the result can be safely interpreted as poor absolute risk prediction. However, no such assertion can be made if absolute risk prediction is determined to be adequate for mismatched outcomes.

Some of the tools reported thresholds for low- and high-risk patients in order to recommend tailored management of those patients.14, 19, 35, 37-39, 41, 43, 46, 47, 51, 54, 56, 58, 61, 70, 73, 84, 88, 97, 99, 103 In addition, some studies evaluated the effects of risk strata reclassification between different models and for additional variable inclusion to an existing model.35, 36, 46, 54, 61, 88, 99, 101-103 Results of reclassification evaluations were variably reported, sometimes in tabular format and sometimes by reclassification indices. There was a clear correlation with absolute risk prediction performance and classification performance, and some reclassification evaluations resulted in significantly improved performance. It is also important for cohort and model outcome matching, since low- and high-risk threshold cut-off points are set using the development data (i.e., matched outcome). Separate risk cut-off points must be established in order to appropriately use such tools to risk-stratify patients for outcomes other than those for which they were developed.

Almost all models had good relative and absolute risk prediction in the cohort in which they were developed. Clearly this is not surprising, but it does bring into question the limitations of relying on models that have only been internally validated. The external validations with the strongest evidence were among North American and European cohorts in which the same outcome measure was used in the validation study as in the development study. Asian cohort model evaluations had limited generalizability to U.S. populations because they have been shown to have significantly different outcome event rates of CHD and cerebrovascular disease.

External validation of U.S. models developed in other U.S. cohorts found that most retained good relative and absolute risk prediction performance among white and black populations, but absolute risk prediction was poor among minority populations, such as Hispanics and Asian Americans.23, 97, 100 A few evaluations using higher- or lower-risk cohorts, such as siblings of patients with early coronary artery disease or young adults, predictably had poor absolute risk prediction performance.42, 49 In all cases, overall model relative risk performance (risk separation) was better for women than men.23, 42, 49, 97, 100 Generally, these risk models are most likely to perform accurately in patients representative of the source population in which they were developed.

External validation of U.S. risk models among European cohorts in which the outcomes were matched was more mixed. A few studies with matched outcomes reported acceptable risk model performance, but the European cohorts were generally at higher risk than the source population, including all-diabetic or elderly cohorts.48, 89 A few studies reported that the risk models underpredicted the outcomes, but these were almost entirely high-risk patient cohorts, such as patients with diabetes, organ transplants, advanced age, poorly controlled hypertension, or poor access to health care.56, 77, 82, 85, 89 Most of the evaluations among European cohorts found that the U.S. risk models overpredicted risk.14, 48, 56, 80, 88, 92, 94, 110 This was frequently due to a difference in underlying outcome event rates between the model cohort and the evaluation cohort. In some studies, significant differences between relative risk factor contributions were also found.30

A number of U.S. cohorts that engaged in recalibration or remodeling reported poor absolute risk performance for the original FRS models. However, most of these evaluations had an outcome mismatch between the cohort and model.54, 61, 101 Those studies that performed remodeling of the FRS risk variables in the local cohort reported retained or improved relative risk prediction and adequate absolute risk prediction.54, 61, 101 It should be noted that it is not surprising that remodeling with an outcome that matches the original model outcome (by definition) would result in improved performance. For example, one study evaluated matched outcomes between the cohort and the original model and found that minority populations were poorly predicted by the model. This study subsequently showed that remodeling resulted in adequate performance for all the cohorts.23 Two other studies with matched outcomes and inadequate original model performance noted adequate absolute risk prediction after remodeling.45 In contrast, recalibration methods (which adjust the baseline outcome event rate intercept in the model but do not adjust the risk variable coefficients) performed more variably, with both adequate and inadequate absolute risk prediction results.42, 45 However, in the one study that performed both recalibration and remodeling, recalibration was sufficient for women but not men, and remodeling resulted in adequate absolute risk prediction for both.45

Key Question 2

There were six diabetic cohorts that were used to develop risk prediction models and 11 diabetic cohorts that were used in external validation of diabetes-specific risk models for CVD, CHD, or stroke outcomes. 38, 40, 57, 63, 72, 73, 78, 85, 96, 107, 108, 119 There were 13 non-diabetic cohorts used in either primary model development or external validation of risk models excluding diabetes or general purpose models.

The UKPDS risk model119 was the most frequently validated type 2 diabetes model.38, 40, 73, 78, 108 However, three of the five studies were from U.K. cohorts, and there were no U.S. validations of this model. Even among the U.K. external validation studies, absolute risk prediction performance was variable, interpretation was complicated by outcome mismatches, and there was no matched outcome external validation of the model.73, 78, 108 In contrast, there was clear evidence that the UKPDS outperformed general cardiovascular risk models when they were directly compared among diabetic populations.73, 78 Another externally validated type 2 diabetes cardiovascular risk model is the DARTS model, which was developed in a different British cohort. A third type 2 diabetes model that was only internally validated was developed in Chinese patients.38 In all three models, diabetes-specific risk factors were included.

Evaluation of the contribution of diabetes to the risk of developing cardiovascular outcomes was evaluated in two studies, one consisting of only U.S. cohorts and the other including both U.S. and European cohorts.23, 30 The U.S. cohort comparison study found that cohorts comprised of non-white or Hispanic populations had significantly different relative risks among those factors than the Framingham cohort. However, the risk of CVD among patients with diabetes differed significantly from that in the FRS population only for a Native American cohort. A similar comparison that included European studies as well demonstrated different CVD risk in the European cohort relative to the FRS cohort.

These studies also showed the effect of including or excluding variables in a multivariate analysis, since both evaluated some of the same cohorts, but the U.S./European study did not include as many of the traditional risk factors as the U.S.-only study.23, 30 Some additional risk was attributed to diabetes when there were fewer variables in the multivariate analysis. This was most likely due to a correlation between diabetes and the variables that were omitted, and reinforces the concept that any risk estimate for a variable includes residual confounding from unmeasured covariates.

Most of the matched outcome external validations performed on diabetic cohorts by cardiovascular risk models that included diabetes as a risk factor found that the models significantly underpredicted the number of outcomes experienced in the cohort, suggesting that developing predictive models in cohorts that combine patients with and without diabetes may be less than ideal.73, 78, 85 A few studies showed acceptable observed-to-expected ratios, but had outcome mismatches that were more restrictive in the cohort than the model.40, 72 The effect of increased risk of CVD in diabetic populations precludes simply adding a diabetes risk variable to a general model to capture the variance of risk experienced by diabetic populations. In other words, simply including diabetes as a variable in a general model is insufficient to fully capture the level of risk in patients with diabetes. More descriptive variables that have confounding or effect-modifying effects are likely necessary for analyses in these populations, including diabetes control, duration of diabetes, and whether the patient has already experienced end-organ damage.

There were a few studies that evaluated risk models that included diabetes as a risk factor in non-diabetic cohorts. For example, Czech patients without diabetes were evaluated with the 1998 FRS model with matched outcomes, resulting in an overprediction of outcomes.56 The Norwegian Counties Study evaluated the SCORE risk model, which does not include a diabetes risk factor but does include patients with diabetes in its source cohort, in patients without diabetes and also found that the model overestimated the number of outcomes.44 The internal validation of the QRISK equation for CVD risk excluded patients with diabetes and was used to externally validate the 1991 FRS general risk model.46 Again, the 1991 FRS model significantly overpredicted the outcome, although there was a small outcome mismatch. Models including diabetes as a binary variable, in which patients without diabetes are given a value of zero, should in theory perform well in non-diabetic populations, where all individuals would simply have zero risk associated with that condition. The fact that they do not points to the strong likelihood that a dichotomous diabetes risk predictor does not account for all of the cardiovascular risk associated with having diabetes.

In several studies, a risk model with diabetes as a risk factor was directly compared to a diabetes-excluded model. The Women's Health Study, in which 2.9 percent of patients had diabetes, evaluated the FRS ATP-III and 1998 models, but the outcomes were very mismatched in the ATP-III (CVD vs. hard CHD) and 1998 models (total vs. hard CHD), and absolute risk prediction was poor in both.54 The Chicago Heart Association Detection Project in Industry study evaluated young men without diabetes for matched outcomes in the ATP-III model and unmatched outcomes in the 1998 model, but absolute risk performance was poor in both because of the young population.49

Remodeling efforts among diabetes and diabetes-excluded risk models followed the larger trend of general cardiovascular risk prediction models. Recalibration methods were successful in some cases but inadequate in others.38 However, remodeling methods were almost always successful in producing a model that performed well in the local cohort.38 Among non-diabetic cohorts and general risk models, remodeling was successful in improving performance, although it should be noted that diabetes as a risk factor was dropped from the models.56 Among a large U.S. female non-diabetic cohort, remodeling of the FRS ATP-III risk variables did not result in a well-calibrated model.61

Remodeling of established risk models for use in other cohorts also serves to illuminate systematic relative risk differences between risk factors. For example, although absolute risk prediction was very poor when the UKPDS model was applied to the Hong Kong Diabetes Registry, a direct comparison of the hazard ratios of the same risk variables between the two cohorts did not show significant differences.38 Thus, both the baseline outcome incidence and the relative risk contributions from individual risk factors are relevant to absolute risk performance.

Discussion

Limitations of the Literature

Summarizing this literature is challenged by the tremendous outcome heterogeneity among model evaluation studies. In many cases, only limited comparison was possible between cohorts and models with different outcomes. Minor mismatches were more common than large categorical differences, but this still could have significant impact on the absolute risk prediction performance of a model, as shown by large differences in outcomes in cohorts reporting multiple similar outcomes.40

External validation studies showed fair performance when FRS models were applied to U.S. populations that were similar to the source FRS cohort, but failed when applied to some minority populations. European general risk models have not been widely validated in U.S. populations, and U.S. risk models tended to perform poorly in European and Asian cohorts. This suggests, but does not confirm, that European models would likely perform poorly in U.S. cohorts.

Changes in baseline outcome event rates and relative contributions to risk from different risk factors present in either the source model or the application cohort, but not both, clearly led to poor performance in some models. Remodeling, and to a lesser extent, recalibration, have been shown to be successful methods for improving model performance in a variety of cohorts.

However, methodological issues remain, including lack of empirical evidence for the appropriate frequency at which remodeling should occur and the optimal sample sizes for these analyses.

Summary and Interpretation

Overall, the FRS models performed fairly well in U.S. populations, but there were absolute risk prediction problems when they were applied to populations that were substantially different than the source cohort. In some cases, this was due to particularly low or high baseline risk in the destination cohort, and in some cases it was due to systematic differences in risk attributable to specific risk factors. Although all of the FRS risk models were developed from a cohort that was not entirely representative of the U.S. population, the 2001 ATP-III version demonstrated several benefits over the older FRS models, including a focus on a hard CHD outcome, exclusion of patients with diabetes, and incorporation of more current FRS data than the 1991 version. A 2008 CVD model was recently published but has not yet been externally validated.120

Recalibration, and to a greater extent, remodeling, demonstrated effectiveness as a means to improving performance in cohorts with substantially different outcome incidence or risk factor prevalence from the source cohort. Questions remain regarding the population sample size necessary to perform these methods and how frequently it should be applied.

Development of risk models for cohorts with risk profiles that are systematically divergent from the general population can also be a successful strategy. However, in many cases, studies taking this approach were more or less remodeling exercises using traditional risk variables in the most common models. Sample size requirements for developing stable risk models are even less clear for these cohorts, and some of these studies had fewer than 1,000 participants. A growing body of literature suggests that specific cohort risk models are likely to be most successful when there are risk factors unique to that population that inform cardiovascular risk.

Even among U.S. cohorts, there was evidence that some ethnically diverse or minority populations had significantly different risk factor contributions to outcomes, even when the baseline prevalence was similar.23, 30 Our review did not exclude studies from any geographic area, but in analyzing the data it became clear that there were systematic differences in risk factor prevalence and outcome event rates between Asian cohorts (which were mostly Chinese or Korean) and North American and European cohorts.121 This makes use of Asian models in a general U.S. population ill-advised.

Diabetes-specific process measurement variables are significantly related to cardiovascular outcome risk among patients with diabetes, and risk models that incorporated these factors outperformed general risk prediction models when applied to these patients. Analysis also suggests that models excluding patients with diabetes outperformed general risk prediction models that included these patients in their development when applied to non-diabetic cohorts. Unfortunately, external validation of diabetes-specific risk models is lacking, particularly among U.S. cohorts. No U.S. diabetes risk model has been externally validated.

Problems with absolute risk prediction were improved or resolved by recalibration and remodeling methods, supporting the need in this literature for periodic recalibration or remodeling for either general or specific populations. However, empirical evidence for determining what time interval is reasonable or for detecting when a population is “significantly” different from the reference population does not yet exist.

Views

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...