NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Balion C, Santaguida PL, Hill S, et al. Testing for BNP and NT-proBNP in the Diagnosis and Prognosis of Heart Failure. Rockville (MD): Agency for Healthcare Research and Quality (US); 2006 Sep. (Evidence Reports/Technology Assessments, No. 142.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Testing for BNP and NT-proBNP in the Diagnosis and Prognosis of Heart Failure

Testing for BNP and NT-proBNP in the Diagnosis and Prognosis of Heart Failure.

Show details


Analytic Framework

An analytic framework is a schematic representation of the strategy for organizing topics for review and guiding literature searches. Figure 1 illustrates the inter-relationship among the questions being asked in this systematic review. The key areas addressed were diagnosis of heart failure (HF) using B-type natriuretic peptide tests, the prognostic value of B-type natriuretic peptide levels, and guiding treatment of HF patients using B-type natriuretic peptide measurements. The B-type natriuretic peptides included BNP and NT-proBNP and in the figure they are illustrated as the central component for the key areas. Four settings were chosen to evaluate the diagnostic ability of B-type natriuretic peptides for HF. They included the emergency department, primary care, outpatient clinics and long term care. Patients with coronary artery disease (CAD) risk factors, diagnosed CAD or HF were chosen to evaluate whether B-type natriuretic peptides levels are useful prognostic indicators. In addition the general population was used to determine whether B-type natriuretic peptides could be used for screening. Monitoring of B-type natriuretic peptides with respect to outcome was used to assess the effect of therapy in patients with HF. Furthermore, determinants that affect B-type natriuretic peptide levels independent of HF were extracted for each of the key areas, but not shown as part of the analytic framework.

Figure 1. Analytic Framework.


Figure 1. Analytic Framework.

The methodological chapter has been divided into two sections: (1) General Methods and (2) Question Specific Methods. The first section will describe methods that were general in nature and were applicable to almost all of the research questions in this review. The second section will describe the specific methodological decisions that were relevant to each research question.

General Methods

Refinement of the Topic and the Research Questions

The first step during the topic assessment and refinement process was to organize a teleconference with partner organizations. The Task Order Officer (TOO) invited topic experts and the McMaster multidisciplinary research team to define the magnitude of the topic to be addressed and to refine/clarify the preliminary research questions for this evidence report. An international Technical Expert Panel (TEP) was assembled to provide high-level content expertise on this topic (Appendix E) and to participate in conference calls on an as-needed basis throughout the data refinement and extraction phase.

Search Strategy

Two search strategies were undertaken, one for the main report (Appendix A) and a second one for the review of reviews (Appendix A) for Question 2b. The bibliographic databases searched included MEDLINE®, EMBASE, CINAHL, Cochrane Central and AMED (Allied and Complementary Medicine) from 1989 to February 2005. Hand searching was not undertaken for this systematic review.

For Question 2b, which compared other diagnostic tests relative to BNP and NT-proBNP, a review of reviews was undertaken in MEDLINE® and EMBASE from January 2000 to September 2005. The start date of 2000 was chosen in order to identify only the most recent reviews.

Eligibility Criteria

A list of eligibility criteria was developed in Systematic Review Software (SRS) for the purposes of this systematic review. Details of the eligibility criteria can be found in Appendix B.


Criteria for publication inclusion. Language: Only English language studies were eligible. The number of non-English studies that were excluded equaled approximately 6 percent of all possible citations (268/4342). Publication Date: 1989 to February 2005. Our search started in 1989, as this was the first year an assay for BNP was reported.

Criteria for publication exclusion. Narrative and systematic reviews (except for Question 2b), editorials, letters, comments, opinions, abstracts and unpublished studies were excluded.

Assay method

Measurement of BNP or NT-proBNP. This systematic review included only those studies that measured BNP by methods that were available commercially for diagnostic use in a clinical setting up to February 2005 (Table 1). However, for NT-proBNP methods, three methods were included that were not commercially available for use in clinical settings for the purposes of diagnosis (see Table 2). One of these methods was the early generation assay to the Roche NT-proBNP method (ELISA method). The other two methods (Biomedica and Christchurch) were included because of their frequent use and because comparison studies have been done with the Roche NT-proBNP method.147, 164 The purpose of these restrictions was to ensure that results from this systematic review were not unduly affected by the test method used. The goal was to reduce the variability and thus uncertainty in the analysis of our results and for them to be directly applicable for clinical use (since these will be the methods clinical laboratories will use). One limitation with this approach is the possible exclusion of studies with important information not available in any of the included studies. Also, the strength of some findings may be weakened due to a smaller number of studies reporting similar findings but using different test methods. Tables 1 and 2 provide the details of the assays used in this review to measure BNP or NT-proBNP.

Table 1. Details of BNP test characteristics.

Table 1

Details of BNP test characteristics.

Table 2. Details of NT-proBNP test characteristics.

Table 2

Details of NT-proBNP test characteristics.

Number of measurements of BNP or NT-proBNP. For Question 4, BNP or NT-proBNP was to be measured at a minimum of 3 time points. This restriction was not applied to any other question in this review.


Criteria for population inclusion. Any population including any subjects aged greater than or equal to 18 years of age.

Criteria for population exclusion. All studies conducted on animals or on human samples other than blood (e.g., urine) or cell cultures were excluded from this review.

Study designs

Criteria for study designs inclusion. All study designs (randomized controlled trials (RCTs), observational, case control, cohort studies) for primary data were included. In addition, systematic reviews were included to address Question 2b.

Data Collection and Reliability of Study Selection

A team of trained research assistants evaluated the title, abstract and full text screening. Standardized forms and a guide explaining the criteria were developed. Two reviewers were required to achieve consensus on the identification, selection, validity and abstraction of articles and information. Disagreements that were not resolved by consensus were settled by one or more members of the local expert team.

Quality Assessment of Included Studies

To assess the quality of primary studies we utilized standardized rating scales with acceptable reliability and validity. The specific scale to be used was dependent on the study design and the research question. The Quality Assessment of Diagnostic Accuracy Studies (QUADAS)165 was selected to evaluate studies chosen for the research question addressing diagnostic accuracy of the BNP or NT-proBNP test. The QUADAS was developed specifically to take into account biases unique to the design of diagnostic studies. Quality items were considered individually rather than as a composite score as recommended by the developers of this tool.166 The Jadad scale was used for studies that were RCTs.167 For non-randomized study designs the only two criteria selected for evaluation were consecutive sampling and blinding to the outcome.168 For quality assessment of systematic reviews, the Screening and Test Evaluation Program (STEP) checklist was used.169 Appendix B shows the instruments used to evaluate quality.

Summarizing Our Findings: Descriptive and Analytic Approaches

Both descriptive and quantitative approaches were used to summarize study characteristics and outcomes. Multiple publications on the same study cohort were grouped together and treated as a single study for statistical analysis. Standardized summary tables explicating important study population and BNP or NT-proBNP test characteristics, as well as study results, were created. Results for BNP and NT-proBNP measurements were reported using the units pg/mL. Conversions were made to pg/mL, using the factor 1 pmol/L = 3.46 pg/mL for BNP and 1 pmol/L = 8.457 pg/mL for NT-proBNP.

Meta-analysis was only carried out for Question 2a. Meta-analysis for the remaining questions was not considered for several reasons including lack of data, too few studies and significant clinical heterogeneity. Quality scores were not used for weighting data in any of the analyses; rather, the inverse of the variance was used to weight studies.

For each primary study included in Question 2a, we calculated the following measures of test results accuracy: sensitivities, specificities, likelihood ratios (positive LR+ and negative LR-) and diagnostic odds ratios (DOR). For those papers where the actual numbers of true and false positive and negative results (TP, FP, TN, FN) were presented, or where enough information was given to allow us to calculate and estimate these numbers, we recalculated the sensitivities, specificities and calculated the LR+, LR- and DOR with the accompanying 95 percent confidence intervals (CI).

These measures were calculated across different cut points and by study setting (emergency, outpatient, primary care and long term care settings) for BNP and NT-proBNP separately. Overall estimates of the diagnostic accuracy of the test were obtained by pooling the sensitivities, specificities and LRs obtained from each primary study. These different analyses were assessed for publication bias (graphical as well as statistical). We used sensitivity analysis to examine the influence of one study at a time and Galbraith plots for assessing heterogeneity across studies.

Our initial analyses considered the level of heterogeneity across the individual studies that were included in the meta-analysis. The Cochrane's Q test was used as a measure of heterogeneity in all the meta-analyses and the I2 as a measure of inconsistency. We observed some heterogeneity in many of our meta-analyses and as a result, analyses using the random effects models were selected. Subgroup analysis and stratification were carried out to further explore the causes. As a part of these, meta-regression methods were employed to study the effects of a few covariates on the respective diagnostic test measures. Due to the number of studies available, we were only able to carry out univariate meta-regressions in most cases. We also assessed the correlation between sensitivities and specificities. However, no significant correlations were observed. All statistical analyses were carried out using Stata/SE 8.0 for Windows (Stata Corporation) and Meta Package.

Pooled estimates were also calculated for DORs and summary receiver operator characteristic (SROC) curves were created in our analyses to assess the effect of different cut points. A DOR is a simple measure used when combining sensitivities and specificities from different studies. It is easy to calculate and less sensitive to diagnostic thresholds.170 The DOR makes use of the sensitivity and specificity pair by comparing the odds of one to the other. It compares the odds of positive test results in the trial participants with the outcome of interest, to the odds for positive test results for those without the outcome of interest (Equation 1).

Equation 1: DOR

DOR= sensitivity/( 1 -sensitivity)/( 1 -specificity)/(specificity)

The standard error of the log DOR is approximately given by:

Image er-bnpfu1.jpg

Where TP is true positive, FP is false positive, TN is true negative and FN is false negative. Appropriate adjustments are made in cases of zero counts.

An alternative formulation of the DOR is given in Equation 2:

Equation 2: Alternative calculation for DOR.

DOR= sensitivity/( 1 -specificity)/( 1 -sensitivity)/(specificity) =LR + /LR -

Where the LRs are the positive and negative likelihood ratios.

Using this definition, a DOR is a measure of the spread between the two LRs. The SROC curve mimics the receiver operator characteristic (ROC) curve and is a way to measure the diagnostic accuracy across different studies. It is based on logit transformation of the data, which plots D, the difference between the logit of the true-positive rates (TPR, sensitivity) and the logit of the false-positive rates (FPR, 1 - specificity) on the y axis against their sum S on the x axis i.e., D = logit TPR — logit FPR against S = logit TPR + logit FPR. The y axis (D) is equivalent to the log (DOR), and the x axis (S) is a way to measure how the test characteristics vary with respect to the thresholds of the diagnostic tests. A regression equation (D = α + β * S) derived from the SROC curve analysis can be used to assess the heterogeneity among study results.171. It is possible to get spurious SROC plots based on regression analysis when individual studies have homogeneity in their results since regression analysis with small variations in both the independent and dependent variables can result in misleading results.

Question Specific Methods

Population Criteria for Each Question

Question 1: criteria for population inclusion. All studies that were eligible for Questions 2, 3 and 4 were considered for Question 1. For Question 1, all determinants associated with B-type natriuretic peptides were abstracted except for the well-known relationship to systolic HF or severity of HF, and echocardiographic parameters associated with systolic dysfunction. Both categorical determinants (e.g., gender, disease status, drug therapy) and determinants with continuous scale (e.g., creatinine, weight, left ventricular mass) were included, however, determinants were excluded if the continuous scale was categorized into a categorical variable (i.e., above and below a cut point value). Drug therapy data were included if the therapy was compared to baseline or a placebo group. Data on all determinants that were analyzed using univariate or multivariate regression approaches were abstracted; however, if both analyses were available, the multivariate took primacy in the results. If data were given for multiple time points the admission time was chosen unless otherwise specified in the evidence tables (Evidence Table 1, Appendix C). Although these restrictions decreased the number of abstractable pieces of data, it also reduced the classification error.

Question 2a: criteria for population inclusion. A study was also eligible if it considered one of the following symptoms or signs as a marker for HF: anginal pain, anginal syndrome, ankle swelling, bilateral leg edema, breathlessness, cardiac dysfunction, cardiac insufficiency, cardiomegaly on chest x-ray, diastolic distensibility, diastolic dysfunction, diastolic dysfunction on cardiac catheterization, diastolic stiffness, dyspnea, ejection fraction (EF), elevated jugular venous pressure, fatigue, fluid retention, hepatomegaly, left ventricular (LV) relaxation, filling, LV systolic function (or dysfunction), nocturnal cough, orthopnea, palpitation, paroxysmal nocturnal dyspnea, peripheral edema, pleural effusion, pulmonary congestion, pulmonary rales, tachycardia (heart rate ≥ 120 beats/min), third heart sound, ventricular dysfunction, weight loss.

Question 2a: criteria for population exclusion. For emergency of primary care settings only, studies were excluded if the population had subjects with known HF, and samples that only included subjects with any of the following: heart transplantation, obesity clinic patients, hypertrophic cardiomyopathy, mitral valve regurgitation patients. Inpatient hospital or community settings were excluded.

Question 2b: criteria for population inclusion. Primary studies with traditional diagnostic tests of HF included the following: chest x-ray, echocardiography, myocardial radionuclide angiogram (MRNA), dobutamine echo, cardiac catheter, magnetic resonance imaging (MRI), computerized tomography (CT), and pulmonary/vascular measures.

Question 3a: criteria for population inclusion. All patients with: i) at risk of CAD; ii) with diagnosed CAD; iii) with diagnosed HF. The citation was required to use at least one of the following terms to indicate HF: i) HF; ii) congestive HF; iii) New York Heart Association (NYHA) criteria, NYHA functional class, American College of Cardiology (ACC), American Heart Association (AHA), Canadian Cardiovascular Society (CCS), Modified Framingham Clinical Criteria for diagnosis of Heart Failure, and European Study Group on Diastolic Heart Failure; iv) cardiac dysfunction.

Question 3a: criteria for population exclusion. Studies were excluded if the population had any of the following health conditions: heart transplant, stenosis, renal disease, pulmonary embolism, cardiomyopathy, tumour, amyloid, leukemia, atrial fibrillation after pacemaker implant, respiratory disease, pulmonary hypertension, ischemic stroke, sepsis, perimyocarditis, intensive care unit patients.

Question 3b: criteria for population inclusion. General populations with no known cardiac dysfunction.

Question 4: criteria for population inclusion. Studies evaluating treatments for HF had to have identified the subjects using one of the following criteria: ACC / AHA, NYHA, CCS, Modified Framingham Clinical Criteria for the Diagnosis of Heart Failure, European Study Group on Diastolic Heart Failure.

Question 4: criteria for population exclusion. Studies were excluded if the patients' HF was not stable.

Intervention for Each Question

Selection of interventions was not relevant for research Questions 1, 2 and 3.

Question 4: criteria for intervention inclusion. Treatments for HF could include any of the following:

  • Medications : angiotensin converting enzyme (ACE) inhibitors; angiotensin receptor blocker therapy; beta blockers; cardiac glycosides; diuretics; nitrates; spironolactone
  • Surgeries, Procedures and Medical Devices : balloon valvuloplasty catheter; enhanced counterpulsation; heart valve replacement surgery; automatic implantable cardiac defibrillator; cardiac resynchronization therapy; intra-aortic balloon pump insertion; prosthetic heart valve; ventricular assist device; valvuloplasty (balloon or surgical).
  • Healthy Lifestyles : Exercise; maintain a healthy weight; eat a healthy diet; control blood pressure; control blood cholesterol; prevent and manage diabetes mellitus; quit smoking; manage stress.

Outcome Criteria for Each Question

Question 2a and 2b outcomes criteria for outcomes inclusion. Any measure of the degree or presence of HF was accepted, including: clinical diagnosis, left ventricular ejection fraction (LVEF), change in NYHA class, left ventricular end-diastolic pressure, left ventricular end-diastolic dimension, left ventricular end-systolic dimension, end-diastolic thicknesses of the inter-ventricular septum.

Question 3a and 3b outcomes criteria for outcomes inclusion. Admission to hospital for any of the following outcomes: angina requiring a minimum 24 hour hospitalization (acute coronary syndrome), angiographic percutaneous coronary interventions (including terms angioplasty, bypass surgery, coronary artery bypass graft, cardiac revascularization, percutaneous transluminal coronary angioplasty, stent), atrial fibrillation (arrhythmias), cerebrovascular event (e.g., stroke), composite endpoint, congestive heart failure (CHF), isolated diastolic ventricular dysfunction, mortality (all cause), myocardial infarction (MI).

Question 4 outcomes. criteria for outcomes inclusion. No a priori outcomes were identified for inclusion.

Criteria for outcomes exclusion. Non-cardiac events

Peer Review Process

A list of potential peer reviewers was assembled at the outset of the study from a number of sources including our technical expert panel (TEP), our partners, the McMaster research team, and the AHRQ. During the course of the project, additional names were added to this list by the McMaster Center and AHRQ. Thirteen content experts have reviewed this report (see Appendix E) and their comments and suggestions have been incorporated where possible.

PubReader format: click here to try


  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...