NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Balion C, Don-Wauchope A, Hill S, et al. Use of Natriuretic Peptide Measurement in the Management of Heart Failure [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2013 Nov. (Comparative Effectiveness Reviews, No. 126.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Use of Natriuretic Peptide Measurement in the Management of Heart Failure

Use of Natriuretic Peptide Measurement in the Management of Heart Failure [Internet].

Show details

Appendix EAssessment of Risk of Bias

Assessment of Risk of Bias: Prognosis Studies

Risk of bias of prognosis studies was assessed using a modified version of the guidelines proposed by Hayden, et al.

To enhance the appropriateness of Hayden's guidelines to this review, several modifications were made to the guidelines prior to commencing the assessment of risk of bias. We modified the tool by adding a seventh area of bias (i.e., study design) for which we asked whether the included studies were designed to test the prognostic value of BNP or NT-proBNP (the studies were not secondary analyses of data collected for other purposes).

The tool was further modified by the revising or adding several domains to the areas of bias, as described in Table 1, and expanded on below:

  • Prognostic factor measurement:
    • For the ‘other prognostic factors measured appropriately’ domain, we decided this domain would only be applicable if a study in question compares BNP or NT-proBNP to some other prognostic indicator
    • We added a new domain: “For BNP/NT-proBNP, the extent of and reasons for indeterminate test results or missing data were reported”
    • We added a second new domain: “For other prognostic factors, the extent of and reasons for indeterminate test results or missing data were reported (applicable when a study in question compares BNP/NT-proBNP to other prognostic indicators)”
  • Outcome measurement:
    • We added a new domain: “The study avoided the use of a composite outcome”.

Hayden, et al., provide a guide to using their tool, which we also adapted for use with our modified version. The modifications that have been made to the tool involved the elimination of several signaling questions that we felt were not relevant to the review:

  • Study Attrition:
    • We dropped “Attempts to collect information on participants who dropped out of the study are described” because authors of included studies would largely be unable to accomplish this task;
  • Prognostic Factor Measurement:
    • We eliminated “Adequate proportion of the study sample has complete data for prognostic factors” and “Appropriate methods are used if imputation is used for missing prognostic factor data” because BNP or NT-proBNP is routinely collected and imputation, even if performed, would not likely be reported by study authors;
  • Measuring and Accounting for Confounding:
    • We dropped “The method and setting of confounding measurement are the same for all study participants” because BNP or NT-proBNP is the focus of most studies and detailed reporting of confounders would not be of the highest priority for study authors;
  • Analysis:
    • We eliminated “The strategy for model building (i.e., inclusion of variables) is appropriate and is based on a conceptual framework or model” and “There is no selective reporting of results” because we felt that commenting on model building and selective reporting would not be suitable for this review, especially given the diversity of strategies that BNP or NT-proBNP researchers may employ in their work.

The final modification involved a change in the response options used to express the degree of bias. Hayden, et. al. originally suggested ‘yes’, ‘no’, ‘partly’, and ‘unsure’ as possible responses to each domain and potential bias item. However, in accordance with the methodology checklist for prognostic studies adapted from Hayden's work and developed by the National Institute for Health and Clinical Excellence (NICE), we opted for the simplified response categories of ‘yes’, ‘no’ and ‘unclear’. Within each domain, an answer of ‘no’ corresponds to a high risk of bias, ‘unclear’ corresponds to a possible or unclear risk of bias, and ‘yes’ corresponds to a low risk of bias.

Assessment of Risk of Bias: Diagnosis Studies

The QUADAS-2was used to assess the risk of bias of diagnostic studies. The investigators tailored the QUADAS-2 to this review by discussing whether some of the tool's signaling questions should be removed from consideration. The signaling questions are intended to help researchers judge the risk of bias in each of the four domains on the QUADAS-2. This discussion took place prior to the start of quality assessment. We decided to omit the following signaling questions from “Flow and Timing”:

  • “Was there an appropriate interval between index test(s) and reference standard:” Since the inclusion criteria for the diagnosis questions required that all blood samples had to be collected on admission or discharge, we deemed this question to be irrelevant.
  • “Did all patients receive a reference standard?”: This question was not relevant to the review because the inclusion criteria specified the use of standard diagnostic criteria for determining HF in included studies (the criteria had to be applied independent of BNP or NT-proBNP test values). This question would only be applicable if there are specific reference standards to determine whether these test values are used or if the screening question was not specific enough to exclude studies where only some patients were diagnosed with HF.

Each signalling question requires a ‘yes’, ‘no’, or ‘unclear’ response. We developed decision rules to consolidate ‘yes’, ‘no’, or ‘unclear’ responses to the signaling questions into a single ‘yes’, ‘no’, or ‘unclear’ response for each risk of bias question (one risk of bias question per domain).

The decision rules are shown in Table E-1 below.

Table E-1. Decision rules to consolidate responses to QUADAS-2 signaling questions into responses to QUADAS-2 risk of bias questions.

Table E-1

Decision rules to consolidate responses to QUADAS-2 signaling questions into responses to QUADAS-2 risk of bias questions.

QUADAS-2 requires an assessment of the extent to which each included study is applicable to the review. Applicability is rated ‘high’, ‘low’, or ‘unclear’ and is assessed separately for three questions. We assessed applicability for each question as described below.

Are there concerns that the included patients and setting do not match the review questions?”: Studies excluding patients with chest trauma, hemodialysis, asthma, COPD, and dyspnea clearly due to causes other than HF (e.g., pneumothorax, coronary ischemia, myocardial infraction) were not seen as a threat to applicability (rated high in applicability). Studies excluding patients with any other diagnosis or comorbidity besides the ones mentioned above raised concerns about applicability. For example, studies excluding patients because of increased body mass index may have unclear or low applicability because BNP and NT-proBNP decrease as body mass increases. Studies that excluded patients on certain medications, excluded difficult-to-diagnose patients or included only easy-to-diagnose patients, restricted the sample to males or females only, or restricted the sample to certain age groups, were regarded as having unclear or low applicability.

  • Are there concerns that the index test, its conduct, or its interpretation differ from the review question?”: Since we included studies that utilized FDA approved assay methods, we determined a priori that concerns about applicability were unlikely to exist in this domain (most studies would be rated high in applicability).

  • Is there concern that the target condition as defined by the reference standard does not match the review question?”: We employed a broad list of acceptable reference tests in this review, so we determined a priori that concerns about applicability were unlikely to exist for most studies in this domain. However, since HF is typically diagnosed using a battery of tests and criteria, applicability was classified as unclear or low in the case of included studies that employed an unusually large number of tests or criteria. Similarly, applicability was classified as unclear or low in studies that employed a single reference test or criterion.

  • Investigators with previous experience conducting systematic reviews assessed the quality of all RCTs and cross-sectional studies. For quality assessments conducted using the NOS, Hayden, et al. criteria, and QUADAS-2, investigators trained a pool of raters. Training included a description of the background and objectives of the systematic review, an examination of the quality assessment instruments to explain the meaning of questions and develop a standardized approach to answering the questions, and pilot rating phases to test the instruments and resolve inconsistencies in interpreting and answering questions.

    Grading the Strength of the Body of Evidence

    In principle, a body of evidence from randomized trials starts with a presumed high strength of evidence, and is downgraded across the domains when there are important overall risk of bias of contributing studies, inconsistency in direction of intervention effect, indirectness of the outcome of interest (e.g., a surrogate outcome rather than a clinical health outcome), and imprecision. For nonrandomized studies, the body of evidence starts with a presumed low strength of evidence but may be upgraded across certain domains. The strength of a body of evidence is graded based on the following four domains: overall risk of bias by outcome, consistency, directness, and precision. A methodologist and a content expert grades the strength of the body of evidence as “High,” “Moderate,” “Low,” or “Insufficient” (Table E-2). A third methodologist with clinical background adjudicated to resolve disagreements.

    Table E-2. Strength of evidence grades and definitions.

    Table E-2

    Strength of evidence grades and definitions.

    Given the results we found, optional domains such as, dose-response association and existence of confounders, were not applicable in this comparative effectiveness review. Given the uncertainties involved in interpreting asymmetry tests for publication bias, mainly in the presence of heterogeneity in effect estimates, we did not plan to investigate publication bias in this review.

    The strength of evidence was graded insufficient when the following occur: no evidence for an outcome, direction of estimates were inconsistent between studies without an identifiable cause, or the body of evidence from the contributing study/studies was underpowered for the outcome of interest (imprecise estimate). That is, when the effect estimate associated with confidence intervals was not only non-significant, but wide enough such that the clinical action would differ if the upper versus the lower boundary of the CI represented the truth, we rated the estimate as imprecise. If an effect estimate is rated as imprecise, this reflects our uncertainty about clinically important benefits, harms or clinically unimportant differences in effect estimates between the contrasting interventions. Customarily only a subset of important outcomes are chosen to grade the strength of evidence—outcomes that are most meaningful for decision-making given a specific Key Question.


    • PubReader
    • Print View
    • Cite this Page
    • PDF version of this title (9.6M)

    Recent Activity

    Your browsing activity is empty.

    Activity recording is turned off.

    Turn recording back on

    See more...