This publication is provided for historical reference only and the information may be out of date.
The search yielded 25,864 records identified from six bibliographic databases (Figure 2). An additional 35 records were identified from three grey literature sources: regulatory agency Web sites, clinical trial databases, and conference sources. After duplicates were removed, a total of 16,893 records were screened at title and abstract level; a total of 3,616 citations moved on to be screened at full text. Following the application of full text screening criteria, there were 310 eligible papers for all research questions in this review. See Appendix G for list of all excluded articles.
A total of 104 papers were allocated for diagnostic accuracy, and from these 76 articles were evaluated for Key Question (KQ) 1, and 28 for KQ2. For KQ3, KQ4, and KQ5, 190 articles were eligible to address the research questions related to prognosis; from these 183 were eligible for KQ3, 22 for KQ4, and seven publications for KQ5. A total of nine articles were evaluated for treatment guided by BNP or NT-proBNP for KQ6, and seven articles for KQ7 focusing on biological variation.
Key Question 1. In patients presenting to the emergency department or urgent care facilities with signs or symptoms suggestive of heart failure (HF)
- What is the test performance of BNP and NT-proBNP for HF?
- What are the optimal decision cutpoints for BNP and NT-proBNP to diagnose and exclude HF?
- What determinants affect the test performance of BNP and NT-proBNP (e.g., age, gender, comorbidity)?
Sample and Design Characteristics of Papers Assessing BNP
There were 51 publications that met the criteria for KQ1 and examined cutpoints for BNP. Thirty-seven examined BNP only3,72-107 and 14 examined both BNP and NT-proBNP.108-121 See Appendix H KQ1 Evidence Set.
Prospective study designs included two randomized controlled trials (RCT)97,102 and nine cohort studies.92,98,106,116-121 The remaining papers (n=40) used a cross-sectional design. The selected articles were published between 2001 and 2011 and were conducted in a wide range of regions: nine in North America,72,82,83,90,101,105,107,117,120 twenty-two in Europe74,79,85,87,88,94,96,100,102-104,106,109-115,118,119,121 two in Asia,86,95 one in South America,78 two in Australia,89,97 and one in New Zealand.108 Thirteen papers were conducted in multinational sites3,73,75-77,80,81,84,91-93,98,99 and one was unclear as to region of conduct.116
Most articles, with the exception of ten,74,84,87,90,91,96,109,115,119,120 provided diagnostic information on the overall study sample. Some papers provided diagnostic information on populations grouped according to age,73,74,85,89,101,111,113,119 sex,73,74 and ethnicity.73
Some papers presented diagnostic information according to body mass index (BMI) status,91,101,102 diabetes status,84 previous history of heart failure (HF),72,89,96 permanent/paroxysmal atrial fibrillation (AF),92 renal function/estimated glomerular filtration rate (eGFR),101,109,113,114,120 history of hypertension or blood pressure elevation on admission,99 and left ventricular ejection fraction (LVEF).116 Three papers included information on HF populations.76,100,102
In all papers, study patients presented to emergency departments with shortness of breath and were 18 years of age and older. Seventeen articles had a patient population with mean or median ages from 60 to 69 years old72,73,75-77,79-81,83,84,91,93,99,104,118,121,122 and 1474,78,87,89,95-97,101,102,106-109,114 had populations with mean or median age ranges between 70 and 79. Four studies had a mean or median patient population over 80 years of age85,94,105,112 and ten did not report on age of study population.3,82,86,88,90,98,116,117,119,120 Six articles reported ages in the following ranges: 65 to 100,111 43 to 90,113 67 to 82,123 58 to 82,110 68 to 82,100 and 30 to 95 years.103
The percentage of males enrolled in each study ranged from 5.6 percent84 to 100 percent72 (mean=66.2%; median=66.2%). Sample size populations (including subpopulations) ranged from 989 to 1,6143 (mean=404, median=251). The prevalence of HF in the study populations ranged from 8.3 percent100 to 84 percent96 (mean=45.1%; median=46.6%).
Of the 51 selected papers, 11 used data from the Breathing Not Properly Multinational Study,73,75-77,80,81,84,91-93,99 three used data from the B-type Natriuretic Peptide for Acute Shortness of Breath Evaluation (BASEL) study,96,102,106 one from the Biomarkers in Acute Heart Failure (BACH) study,3 and one from the BNP in Shortness of Breath study.97 One article used data from the Heart Failure and Audicor technology for Rapid Diagnosis and Initial Treatment (HEARD-IT) study,98 and one was from the epidemiological study of acute dyspnea in elderly patients (EPIDASA) study.85 One set of authors published results on the same data sets114,119 and the remaining articles (n=31) were independent papers, publishing results on unique data sets.
Seven articles used the Abbott AxSYM® B-Type Natriuretic Peptide (BNP) Microparticle Enzyme Immunoassay (MEIA)),95,97,98,100,106,110,115 five used the TRIAGE-B-Type Natriuretic Peptide (BNP) test for the Beckman Coulter Immunoassay Systems,3,103,116,118,121 two used the I-STAT BNP test,101,107 two used the ADVIA-Centaur® BNP Assay, Bayer Diagnostics ACS:180® BNP Assay,98,113 and two used the ADVIA-Centaur® B-Type Natriuretic Peptide (BNP) Assay.88,98 The remaining papers (n=35) used the TRIAGE-B-Type Natriuretic Peptide (BNP) test.
Diagnosis of Heart Failure in Papers
The majority of articles (n=45) based the diagnostic reference standard on clinical judgment.3,72-81,83-85,87,89-99,101-104,106-109,111-121 Of these 45 articles, most (n=34) had a reference standard agreed upon by at least two physicians (mostly cardiologists), ten based the final diagnosis on the opinion of a single cardiologist or other type of clinician,72,78,89,96,102,107,109,118-120 and one article did not indicate this information.121 The adjudication physicians each arrived at a diagnosis of HF based on their interpretation of all available clinical data; this often included echocardiography results. One article106 included BNP in the data used for adjudication. Of the 45 papers using clinical judgment to make the final diagnosis, the Framingham criteria were used in 15, and the National Health and Nutrition Examination Survey (NHANES) was used in 10.
Of the remaining articles (n=6), three based the final diagnosis of HF both on clinical judgment and results of echocardiography,82,88,100 one based it on echocardiography results alone,86 one reported that the definitive diagnosis was based on the Framingham criteria,110 and one reported that the HF status was based on discharge diagnosis.105
BNP: Test Performance and Optimal Cutpoints in Emergency Department
Diagnostic Properties in BNP
The 51 papers evaluating BNP in the emergency department used several cutpoints ranging from 12.586 to 983.586 pg/mL or ng/L (mean=213.1; median=162). One study measured BNP in pmol/L and had cutpoints ranging from 20 to 100.108 These were converted to pg/mL for analysis. Reported sensitivities ranged from 36 percent92 to 100 percent74,78,86,89,113 (mean=82.4 percent; median=86 percent), specificities from 14 percent76 to 99 percent96 (mean=75.4 percent; median=79.5%), and areas under the curve (AUC) of 0.0892 to 0.9978,82 (mean=0.84; median=0.89). Of the 51 papers looking at BNP, 14 also looked at NT-proBNP.88,108-120 Appendix H Tables H-1 and H-2 present summary tables of these studies.
The majority of papers reported on the Triage BNP Point-of-Care test. Two papers reported on the Triage BNP test licensed to Beckman Coulter for use on their laboratory instruments.3,103 Four papers reported using the Abbott AxSYM,97,100,101,110 and one reported using the ADVIA-Centaur system.88 Gorissen et al.113 reported on two systems (ADVIA-Centaur and Triage).
Data were extracted, 2x2 tables prepared, and forest plots of sensitivities, specificities, positive and negative likelihood ratios (LRs), diagnostic odds ratios (DORs), and summary receiver operator characteristic (ROC) curves are presented (see Appendix H Figures H-1 to H-12). Three cutpoints were selected: lowest presented, manufacturers' suggested, and the optimal cutpoint as chosen by the authors.
If the lowest cutpoint presented by the authors is chosen, all papers except four111,113,119,120 return sensitivities greater than 90 percent (summary estimate 95 percent, (95% confidence interval (CI) 93 to 97 percent)). Negative LRs (LR-) were all less than 0.20 for this group. Overall, specificity was lower and much more variable, ranging from 27 to 88 percent (summary estimate 67 percent (95% CI, 58 to 75 percent)).
Among papers that reported a sensitivity less than 90 percent, Ray et al.111 and Chevenier-Gobeaux et al.119 enrolled patients older than 65 years. Both papers used higher cutpoints than most other papers (Ray: 250 pg/mL; Chevenier-Gobeaux: 270 pg/mL 65-84 years, and 290 pg/mL >85 years). deFilippi et al.120 enrolled a population with a high prevalence (47 percent) of subjects with eGFR <60 mL/min/1.73 m2. Gorrison et al.113 reported using the ADVIA-Centaur and Triage assay systems. They also selected a high cutpoint (225 pg/mL) and report a sensitivity of 65 percent and 73 percent, below all other papers.
Using package inserts, 501(k) submission forms, and product brochures, we determined the manufacturers' recommended cutpoints. In all cases the manufacturer suggested a cutpoint of 100 pg/mL to rule out the diagnosis of HF. Twenty-one papers reported for this cutpoint. Sensitivities ranged from 86 to 100 percent (summary estimate 95 percent (95% CI, 93 to 96%)), and specificities ranged from 31 to 97 percent (summary estimate 66 percent (95% CI, 56 to 74 percent)).3,74,79,81-83,85,86,88,89,93,95-97,101,104,107,108,110,112,114
Twenty-eight papers3,74,77-79,81-83,85,86,89,91,93-98,100,104,108,110-114,119,120 examined an optimal cutpoint. The majority (n=19) of the studies determined a cutpoint that maximized accuracy, either using an ROC curve or by examining several arbitrary cutpoints74,77-79,81-83,85,86,94,96,97,108,110-113,119,120 Three studies maximized sensitivity,89,93,104 three others used the manufacturers' suggested cutpoint or other accepted threshold3,91,114 and one study used multiple logistic regression,95 one set the sensitivity at 90 percent and determined specificity,100 and one set the sensitivity at 96 percent in all subgroups and determined specificity.98 Sensitivities ranged from 65 percent to 100 percent (summary estimate 91 percent (95% CI, 88 to 94 percent)), specificities ranged from 34 percent to 97 percent (summary estimate 80 percent (95% CI, 74 to 85 percent)). Using the optimal cutpoint resulted in a higher overall estimate of the positive LR (LR+ (4.61, 95% CI, 3.49 to 6.09) compared to either the lowest cutpoint (2.85 (95% CI, 2.23 to 3.65)), or the manufacturer cutpoint (2.76 (95% CI, 2.12 to 3.59)). The LR- was not significantly different (p>0.05).
Choosing the lowest, manufacturer, or the optimal cutpoint had little effect on the diagnostic performance of the test. The test displayed high sensitivity and a high LR-, but a low specificity and LR+.
BNP: Determinants of Test Performance in Emergency Department
The effect of various determinants upon the diagnostic performance of BNP for the diagnosis of HF were examined.
Eight articles73,74,85,89,101,111,113,119 examined the relationship between age and BNP. In all cases, increasing age was associated with an increase in BNP concentration, but the correlation of age with the diagnostic performance of the test was not clear in the papers.
Four papers73,101,111,119 examined different decision cutpoints based upon age, each using different reasoning and criteria (Table 2). Maisel et al.73 suggested cutpoints no greater than 100 pg/mL for both age groups, above and below 70 years of age. These decision points maximized sensitivity, with specificity being the second concern. Their reasoning was that a false negative result was less desirable than a false positive in terms of cost to the patient.
Rogers et al.101 using the manufacturers' suggested cutpoint of 100 pg/mL, established the sensitivity of the entire cohort at 91 percent. To achieve 91 percent sensitivity in those 75 years of age and older, the decision point was set at 184 pg/mL. The specificity at this point was 54 percent.
Chenevier-Gobeaux119 examined the very elderly, 85 years of age and older, compared with those aged 65 to 84. For the younger group, the optimal cutpoint was 270 pg/mL (sensitivity 73%, specificity 83%), whereas for the very elderly the optimal cutpoint was 290 pg/mL (sensitivity 80%, specificity 69%).
For those aged 65 and older, Ray et al.111 established an optimal cutpoint of 250 pg/mL (sensitivity 73%, specificity 91%). In an earlier paper,85 these authors also established an optimal cutpoint of 250 pg/mL (sensitivity 78%, specificity 90%). It is not clear if these publications used independent study populations.
Gorissen et al.113 examined two different BNP assays and divided their population into three age groups. For the Triage assay, the optimal cutpoint for those less than 65 years was 91 pg/mL (sensitivity 55%, specificity 100%), for those 65 to 75 years of age it was 260 pg/mL (sensitivity 83%, specificity 82%), and for those greater than 75 years the optimal cutpoint was 309 mg/mL (sensitivity 71%, specificity 68%). Similarly, for the Siemens Centaur assay the cutpoints were 91 mg/mL (sensitivity 55%, specificity 100%), 188 pg/mL (sensitivity 83%, specificity 73%), and 247 pg/mL (sensitivity 77%, specificity 68%) respectively.
All authors reported that the optimal BNP threshold for diagnosis of HF increases with age, but there is no consensus on how to set the threshold.
Two papers examined sex and BNP73,74 (Table 3). Maisel et al.73 reported that the difference in BNP concentrations between men and women was not significant. Knudson et al.74 noted differences in sensitivity between males and females using 100 pg/mL as the decision point (males: sensitivity 94.3%, specificity 54.9%; females: sensitivity 90.0%, specificity 55.2%).
One study examined the effect of ethnicity on the diagnostic properties of BNP. Maisel et al.73 reported that the prevalence of HF in their population was significantly greater among whites than among African Americans. Similarly, the concentration of BNP in the white population was significantly greater than in the African American population (200 vs. 117 pg/mL, p<0.001). The AUC is shown in Table 4.
Obesity/Body Mass Index
Three papers91,101,102 examined the effect of obesity on the diagnostic properties of BNP. All three showed that increasing BMI was associated with reduced BNP concentrations. This was true if BMI and BNP were examined in the whole population,101,102 or if the population was examined in two groups: those with and without HF.91
Daniels et al.91 examined the diagnostic properties using a fixed decision point of 100 pg/mL. The sensitivity decreased, but the specificity increased as the BMI increased. In this study the decision points to achieve 90 percent sensitivity was 170 pg/mL for BMI less than 25 kg/m2, 110 pg/mL for BMI 25 to 35 kg/m2, and 54 pg/mL for BMI greater than 35 kg/m2. Specificity was greater than 70 percent in all three subgroups. Rogers et al.101 also adjusted the decision point of the BMI greater than 35 kg/m2 group to achieve the same sensitivity (91%) as the entire cohort (100 pg/mL). This decision point (25 pg/mL) resulted in a reduced specificity. Noveanu et al.102 examined the diagnostic properties at two decision points, 100 and 500 pg/mL. Table 5 displays the diagnostic properties of these papers.
Five papers101,109,113,114,120 examined the relationship between renal function and the diagnostic properties of BNP. Four109,113,114,120 examined eGFR (Table 6) and one101 examined serum creatinine concentration. Three papers109,114,120 optimized the decision point based on eGFR, two109,114 maximized sensitivity, and one120 maximized accuracy.
The BNP concentration was inversely related to renal function: as the eGFR decreased or creatinine concentration increased, the BNP concentration increased.
Using the recommended cutpoint of 100 pg/mL, Rogers et al.101 reported a sensitivity of 100 percent and a specificity of 30 percent for those subjects with serum creatinine ≥2 mg/dL. They then adjusted the decision point for those subjects with serum creatinine ≥2 mg/dL to equal the sensitivity of the entire cohort using the recommended decision point of 100 pg/mL (sensitivity 91%, specificity 54%). This resulted in a cutpoint of 449 pg/mL (specificity 78%).
While these authors recognized that sex, ethnicity, obesity, and renal function have significant effects upon concentration of BNP and potentially on the diagnostic performance of BNP in the diagnosis of HF in the emergency department, all also recognized the difficulty in establishing multiple decision points.
One study84 examined the effect of diabetes mellitus on the use of BNP for the diagnosis of HF. This study reported a nonsignificant difference in the AUC of 0.888 (95% CI, 0.860 to 0.912) for nondiabetics versus 0.878 (95% CI, 0.837 to 0.913) for diabetics.
Sample and Design Characteristics of Papers Assessing NT-proBNP
Eleven papers were prospective cohort studies,116-122,135,136,139,143 one was case-control131 and in two papers, the study design could not be determined.132,141 The remaining papers (n=25) used a cross-sectional design. The selected articles were published between 2003 and 2011. Thirteen were conducted in North America,1,117,120,125,127,128,130,132-134,136,138,139 18 in Europe,26,88,109-115,118,119,121,124,129,131,135,142,143 one in New Zealand,108 two in Asia,122,137 and one in Australia.141 Two papers were conducted in multinational sites2,126 and two were unclear as to region of conduct.116,140
Most papers, with the exception of ten,109,119,120,126,127,130,137,139,140,142 provided diagnostic information on the overall study sample presenting to the emergency department with dyspnea. Some papers provided diagnostic information on populations grouped according to age,2,113,119,122,129,133 sex,127 and ethnicity.127 Some presented diagnostic information according to BMI status,126 renal function,113,130 chronic obstructive pulmonary disease status (COPD)/HF history,128 clinical certainty/uncertainty,139 normal/abnormal chest radiograph,134 with/without diabetes mellitus,140 and NT-proBNP versus usual care.125 Papers examined groups by eGFR readings,109,113,114,120 LVEF readings,116 and red cell distribution width.137
In all papers, patients presented to emergency department with shortness of breath and were 18 years of age and over. Twelve papers had a patient population with mean or median ages from 60 to 69 years2,26,118,120-122,127,128,134,137,139,142 and 19 had mean or median ages between 70 and 79 years.1,88,108-110,113-115,124-126,130-133,136,138,141,143 Five had mean populations aged 80 and over111,112,119,129,135 and one had a population with a mean age under 60 years.117 Two papers did not report age.116,140
The percentage of males enrolled in each study ranged from 39.0 percent114 to 93.2 percent110 (mean=53.3%; median=51.0%). Sample size populations ranged from 68141 to 1,2562 (mean=377, median=378). The prevalence of HF in the study populations ranged from 8.3 percent144 to 63.5 percent128 (mean=37.9%, median=34.9%).
Of the 39 selected papers, ten were from the N-terminal Pro-BNP Investigation of Dyspnea in the Emergency Department (PRIDE) study,1,2,127,128,130,132-134,139,140 two were from the Mannheim NT-proBNP Study (MANPRO),26,142 one was from the International Collaborative of NT-proBNP (ICON) data set,126 one was from the BACH study,125 one was from the Improved Management of Patients with Congestive Heart Failure (IMPROVE CHF) trial,136 and one came from the epidemiological study of acute respiratory failure in elderly patients (EPIDASA) study.111 The remaining (n=23) were independent papers, publishing results on unique data sets.
The majority of papers (n=35) used the ELECSYS® proBNP Immunoassay. Of the remaining papers, three used the DIMENSION-EXLTm N-terminal Pro-Brain Natriuretic Peptide (NTP) Flex® Reagent Cartridge (RF623)26,142,145 and, in the case of one study, the assay used was not stated.143
Diagnosis of Heart Failure in Papers
The majority of papers (n=35) based the diagnostic reference standard on clinical judgment. Most of these (n=31) had a reference standard agreed upon by at least two physicians (mostly cardiologists) and five based the final diagnosis on the opinion of a single cardiologist or other type of clinician.1,26,110,118,141 One study did not indicate the number or qualifications of the adjudicators.122 The adjudication physicians each arrived at a diagnosis of HF based on their interpretation of all available clinical data; this often included echocardiography results. Of the papers judging final diagnosis using clinical judgment, (n=34) three used the Framingham,110,136,143 two used the Boston Criteria,135,143 one used the European Society of Cardiology guideline,142 and one used the NHANES.136
Of the remaining papers (n=2), one based the final diagnosis of HF both on clinical judgment and echocardiography results137 and one based it solely on the European Society of Cardiology guidelines.131
NT-proBNP: Test Performance and Optimal Cutpoints in Emergency Department
Diagnostic Properties in NT-proBNP
The 39 papers evaluating NT-proBNP in the emergency department used several cutpoints ranging from 10026 to 6,550109 pg/mL or ng/L. Reported sensitivities ranged from 53 percent112 to 100 percent88,112,114,127 (mean=85.1%; median=88%), specificities from 5 percent112 to 100 percent,113 (mean=70.9% ; median=73.2%), LR+ from 1.05112 to 115.03,88 LR- from 0.0288,114 to 0.35,119 and AUC of 0.6116 to 0.992 (mean=0.88; median=0.89). Most of the papers (n=32) looked at NT-proBNP alone, with the exception of 15 that examined both BNP and NT-proBNP.88,108-121 Appendix H Table H-4 presents summary data for those papers that examined NT-proBNP.
Of the 19 papers with diagnostic performance data,2,26,88,108,110-115,119,122,124,129,131,135,138,141,143 17 reported on data from the Roche NT-proBNP assay system. One26 used the Dimension EXL system, and one143 used the Roche Cardiac Reader point-of-care test.
Data were extracted, 2×2 tables prepared, and forest plots of sensitivities, specificities, LR+ and LR-, DOR, and summary ROC curves are presented (Appendix H Figures H-13 to H-24). Two cutpoints were selected: lowest presented, and the optimal cutpoint, as chosen by the authors to examine in greater detail.
The diagnostic performance was examined using the lowest cutpoint presented by each author in order to maximize the test sensitivity.
Nineteen papers used an optimal cutpoint in their analysis.2,26,88,108,110-115,119,122,124,129,131,135,138,141,143 Eleven papers used a cutpoint to maximize accuracy, either using an ROC curve or with several arbitrary cutpoints. These points ranged from 825 to 2,000 pg/mL. Two studies122,129 used two decision points; one at 300 or 1,200 pg/mL, respectively, to maximize sensitivity, and one at 900 or 4,500 pg/mL, respectively, to maximize specificity. Two papers chose 300 pg/mL, one26 to maximize sensitivity, and one114 chose this value as the “accepted” threshold. One study143 used the Roche Cardiac Reader point-of-care assay and chose the cutpoint of 1,000 pg/mL but did not provide a reason.
NT-proBNP: Determinants of Test Performance in Emergency Department
The effect of various determinants upon the diagnostic performance of NT-proBNP for the diagnosis of HF for the 39 papers assessing NT-proBNP was examined.
Januzzi et al.2 determined two cutpoints to separate the population into three age groups. For those less than 50 years of age, 450 pg/mL was determined as the best cutpoint to rule out HF (maximum sensitivity). For those 50 to 74 years of age, they chose 900 pg/mL as the best combination of sensitivity and specificity to maximize test accuracy, and for those 75 years of age or older, 1,800 pg/mL provided the maximum specificity in order to rule in HF. Two other papers138,141 adopted this protocol as the optimal cutpoints. Using this approach did not appear to result in significantly improved diagnostic performance compared with the overall estimate. Table 7 shows the diagnostic performance of these papers compared to the overall estimate of the entire group of NT-proBNP papers.
Compared to the lowest cutpoint, the optimal cutpoint displayed a higher overall estimate of specificity and LR+, but was not significantly different in other performance indicators. These data are presented in Table 8.
One study122 used two cutpoints (900 pg/mL >50 years and 450 pg/mL <50 years) for rule in, and a single cutpoint (300 pg/mL) for rule out.
Berdagué et al.129 examined subjects 70 years of age and older, and proposed the use of two decision points for this population: a lower decision point of 1,200 pg/mL to maximize sensitivity (97%) and an upper point of 4,500 pg/mL to maximize specificity (86%). Patients with values in the intermediate “gray” zone required further investigation. A single decision point of 2,000 pg/mL resulted in a test accuracy of 80 percent, deemed unacceptable by the authors of this report.
Januzzi et al.133 examined decision points based on age to optimize rule in, the single cutpoint proposed by the manufacturer, as well as independently generated decision points to evaluate rule out capabilities of the test (Table 9). Januzzi et al.2 used data from the ICON study, an international collaboration that includes data from the PRIDE study,133 which reported separate, selected decision cutpoints that emphasized sensitivity for younger patients and specificity for older ones. They proposed three decision points for age groups under 50, 50 to 75, and older than 75 years to rule in the diagnosis and a single point to rule out. Shaikh et al.122 optimized rule-in cutpoints based on age <50 and >50, but used a single rule-out cutpoint regardless of age. Gorrison et al.113 also suggested that the decision points be increased as the age of the patient increases. Chevenier-Gobeaux et al.119 examined the very elderly (≥85 years of age) and proposed distinct decision points (2,800 pg/mL vs. 1,700 pg/mL) for those over and under 85 years of age (Table 9).
Sex and Ethnicity
Krauser et al.127 examined the influence of ethnicity and sex on the diagnostic properties of NT-proBNP. They reported that the AUC was not different for men versus women or for African Americans versus non-African Americans. There was no difference in the median NT-proBNP concentration between men and women. Similarly, there was no difference in the median concentration between African Americans and non-African Americans.
Obesity/Body Mass Index
A single paper126 examined the effect of obesity and BMI on NT-proBNP performance (Table 10). Using age-specific decision points previously identified, this substudy of the ICON study divided the population into three BMI groups and then calculated the LR+ for each group. Using the overall rule out decision point, they calculated LR-.
They commented that the age-adjusted decision points performed well over a wide variety of BMI. Despite lower sensitivity at the high range of BMI, the predictive values were unchanged.
Two papers113,130 examined the relationship between renal function, expressed as eGFR, and NT-proBNP for the diagnosis of HF (Table 11). Both papers noted an inverse relationship between renal function and NT-proBNP concentration. The relationship was less robust among those with HF than those without. Anwaruddin et al.130 in a substudy of the PRIDE cohort, used the age-adjusted decision points from the main study to determine diagnostic parameters. Gorrison et al.113 used the ROC curve to establish the optimal decision points.
Assessment of Quality for Papers With Emergency Department Settings
The QUADAS-2146 was used to assess quality in four key domains: patient selection, index test(s), reference standard, and flow and timing. The questions in each domain are rated in terms of risk of bias (low, high, unclear) and concerns regarding applicability (low, high, unclear), with associated signaling questions to help with bias and applicability judgments (Figures 3 and 4, and Appendix H Table H-5).
The potential for bias in the domain of patient selection was assessed on the basis of the enrollment of the study sample (consecutive, random, or convenience), the avoidance of case-control design, and the avoidance of inappropriate patient exclusions. For this domain, 25 percent of papers (n=13) were rated as low risk for bias and 20 percent (n=10) were rated as high risk. The remaining papers (n=28; 55%) were rated as unclear as to risk of bias. Papers were assessed as to patient population applicability to those targeted by the review question in terms of severity of the target condition, demographic features, presence of differential diagnosis or comorbid conditions, and setting of the study. Overall, 33 percent (n=17) of papers were assessed as high risk of bias for concerns about applicability on this domain and 57 percent (n=29) were rated as low on concern. The remaining 10 percent (n=5) were deemed unclear on the domain of applicability for patient selection.
The potential for bias in the domain of the index test was assessed according to whether results were interpreted without knowledge of the results of the reference standard and whether a prespecified threshold was used for BNP cutpoints. Seventy-one percent (n=36) of papers were rated as high risk, 20 percent were rated as low risk (n=10), and 9 percent were rated as unclear (n=5) on this domain. Papers were assessed on concerns of applicability on the basis of whether the index test methods varied from those specified in the review questions. Concerns about applicability on this domain were assessed as low for 76 percent (n=39) of papers, as high for 22 percent (n=11), and as unclear for 2 percent (n=1).
The potential for bias in the domain of the reference standard (i.e., the criteria used to confirm a diagnosis of HF) was judged on the basis of whether the reference standard was likely to correctly classify the target condition and whether the results were interpreted with knowledge of the BNP marker results. Papers were rated as low risk for 94 percent (n=48), as high risk for 4 percent (n=2), and as unclear for 2 percent (n=1). Concerns about applicability were assessed as to whether the target condition, as defined by the reference standard, differed from the target condition specified in the review question. Seventy-eight percent (n=40) of papers were assessed as low and 22 percent (n=11) were assessed as high on this domain.
The potential for bias in the domain of flow and timing was assessed on the basis of inappropriate intervals between index test and reference standard, standardized administration of reference standard among patients, and equal inclusion of patients in the analysis. Papers were assessed as low risk of bias for 69 percent (n=35), as high for 20 percent (n=10), and as unclear for 12 percent (n=6) of papers.
For papers of diagnostic tests of NT-proBNP (KQ1), QUADAS-2146 was used to assess quality in four key domains: patient selection, index test(s), reference standard, and flow and timing. The questions in each domain are rated in terms of risk of bias (low, high, unclear) and concerns regarding applicability (low, high, unclear), with associated signaling questions to help with bias and applicability judgments (see Figures 5 and 6, and Appendix H Table H-6).
The potential for bias in the domain of patient selection was assessed on the basis of enrollment of study sample (consecutive, random, or convenience), the avoidance of a case-control design, and the avoidance of inappropriate patient exclusions. For this domain, 28 percent of papers (n=11) were rated as low risk for bias and 46 percent (n=18) were rated as high risk. The remaining papers (n=10; 26%) were rated as unclear as to risk of bias. Papers were assessed as to patient population applicability to those targeted by the review question in terms of severity of the target condition, demographic features, presence of differential diagnosis or comorbid conditions, and setting of the study. Overall, 33 percent (n=13) of papers were assessed as high for concerns about applicability on this domain, 64 percent (n=25) were rated as low, and five percent (n=2) were rated as unclear on concern.
The potential for bias in the domain of the index test was assessed according to whether results were interpreted without knowledge of the results of the reference standard and whether a prespecified threshold was used for NT-proBNP cutpoints. Slightly more than half of papers (n=22, 57%) were rated as high risk on this domain, 28 percent were rated as low (n=11), and 15 percent were rated as unclear (n=6). Papers were assessed on concerns of applicability on the basis of whether the index test methods varied from those specified in the review questions. Concerns about applicability on this domain were assessed as low for 72 percent (n=28) of papers, as high for 26 percent (n=10), and as unclear for two percent (n=1).
The potential for bias in the domain of the reference standard (i.e., the criteria used to confirm a diagnosis of HF) was judged on the basis of whether the reference standard was likely to correctly classify the target condition and whether the results were interpreted with knowledge of the NT-proBNP results. Sixty-two percent of papers (n=24) were rated as low risk, 23 percent (n=9) were rated as high, and 15 percent (n=6) were rated as unclear. Concerns about applicability were assessed as to whether the target condition, as defined by the reference standard, differed from the target condition specified in the review question. Seventy-two percent (n=28) of papers were assessed as low on this domain, 26 percent (n=10) were assessed as high, and 2 percent were rated as unclear (n=1).
The potential for bias in the domain of flow and timing was assessed on the basis of inappropriate intervals between index test and reference standard, standardized administration of reference standard among patients, and equal inclusion of patients in the analysis. The majority of papers (n=37, 95%) were assessed as low risk of bias on the domain of flow and timing, while 5 percent (n=2) were rated as unclear.
Strength of Evidence for Papers With Emergency Department Settings
To grade the strength of evidence (SOE) in this diagnosis section we chose to assess two primary outcomes: sensitivity and specificity. These are concepts that are well understood by clinical users of diagnostic tests. Other diagnostic performance indicators (positive (PPV) and negative (NPV) predictive values, LR+ and LR-, accuracy, and DOR) can be calculated from sensitivity and specificity if the prevalence of disease is known. As such, the conclusions regarding SOE for these performance indicators are unlikely to be different from those drawn for sensitivity and specificity.
For all papers that presented sensitivity and specificity data (BNP n=28;3,74,78,79,81-83,85,86,88,89,93-97,100,101,103,104,108,110-113,119,120,147 NT-proBNP n=182,26,88,108,110-115,119,124,129,131,135,138,141,143), we examined SOE using a variety of cutpoints. For BNP, we selected the lowest provided, manufacturers' suggested, and optimal as chosen by the author. For NT-proBNP we chose lowest and optimal. The papers in the manufacturers' suggested and optimal cutpoint groupings are subsets of the lowest cutpoint grouping.
Risk of Bias
Using the QUADAS-2 tool, the risk of bias in these studies for both sensitivity and specificity was rated (Figures 3 and 4). The tests for publication bias exposed no significant bias in the following conditions in our meta-analysis of BNP diagnostic use in the emergency department: (1) optimum cutpoint; (2) lowest cutpoint; and (3) manufacturer cutpoint (Appendix H Table H-8 and Figure H-25). However, in the four domains of patient selection, index test(s), reference standard, and flow and timing, the concern regarding bias was rated as low.
Both sensitivity and specificity are concepts that are well understood by clinicians and can inform them with regard to clinical practice. The related parameters of NPV, PPV, LR+, LR-, and DOR can also inform clinicians. We rate this domain as direct.
The CIs around the summary estimates of sensitivity and specificity are small (lowest: 0.93 to 0.96; manufacturers' suggested: 0.93 to 0.96; optimal: 0.88 to 0.92). The CIs around specificity are larger (lowest: 0.57 to 0.72; manufacturers' suggested: 0.57 to 0.71; optimal: 0.72 to 0.83). Because the statistical heterogeneity for all summary estimates is large, we rate this domain as imprecise (Table 12).
With respect to sensitivity, the range of estimates across papers is small. We rate this domain as consistent. With respect to specificity, the range of estimates across papers is larger, from 0.64 to 0.77. We rate this domain as inconsistent for specificity (see Table 12).
The SOE estimates were the same for both cutpoints evaluated. The outcome of sensitivity was rated as high for both cutpoints (optimal, lowest). The outcome of specificity wars rated as moderate for both cutpoints due to inconsistency in the value of specificity among studies. Nevertheless, the summary SOE was rated as high. The complete table can be viewed in Appendix H Tables H-9a and H-9b.
Risk of Bias
Using the QUADAS-2 tool, we rated the risk of bias in this study for both sensitivity and specificity (Figures 5 and 6). The tests for publication bias exposed no significant bias in the following conditions in our meta-analysis of NT-proBNP diagnostic use in the emergency department: (1) optimum cutpoint, (2) lowest cutpoint, and (3) manufacturer cutpoint (Appendix H Table H-8 and Figure H-25). In the four domains of patient selection, index test(s), reference standard, and flow and timing, the concern regarding bias was rated as low.
Both sensitivity and specificity are concepts that are well understood by clinicians and can inform them with regard to clinical practice. The related parameters of NPV, PPV, LR+, LR-, and DOR can also inform clinicians. This domain was rated as direct.
The CIs around the summary estimates of sensitivity and specificity are small (lowest: 0.90 to 0.95; optimal: 0.84 to 0.91). The CIs around specificity are larger (lowest: 0.43 to 0.69; optimal: 0.64 to 0.82). Because we included papers that recruited unrestricted populations (patients presenting with signs and symptoms of HF with or without comorbidities), the statistical heterogeneity is large. As such, this domain was rated as imprecise (see Table 12).
With respect to sensitivity, the direction of estimates is consistent, and the range of estimates across papers is small. We rate this domain as consistent. With respect to specificity, the direction of estimates is consistent, but the range of estimates across papers is large, from 0.64 to 0.77. This domain was rated as inconsistent for specificity (see Table 12).
Key Question 2. In patients presenting to a primary care physician with risk factors, signs, or symptoms suggestive of HF
- What is the test performance of BNP and NT-proBNP for HF?
- What are the optimal decision cutpoints for BNP and NT-proBNP to diagnose and exclude HF?
- What determinants affect the test performance of BNP and NT-proBNP (e.g., age, gender, comorbidity)?
Sample and Design Characteristics of Studies Assessing BNP
There were 12 articles that met the criteria for KQ2 that examined BNP in primary care settings. Eight examined BNP only148-155 and four examined both BNP and NT-proBNP.156-159 See Appendix I. KQ2 Evidence Set.
One study used a prospective cohort design151 and the remaining studies (n=11) used a cross-sectional design. The selected articles were published between 2005 and 2011 and were conducted in a wide range of regions: two in North America,152,159 eight in Europe,148,150,151,153-157 one in Asia,158 and one paper in which country of origin could not be determined.149
Most studies, with the exception of three,150,158,159 provided diagnostic information on an overall study sample with dyspnea in a primary care setting. One study provided diagnostic information on populations grouped according to age and sex.158 Several studies presented diagnostic information according to BMI status,158,159 renal function,158 LVEF levels,158 and left ventricular systolic dysfunction (LVSD) status.150,155
In all studies, study patients presented to a primary care facility with shortness of breath and were over 18 years of age. Most studies (n=8) had a patient population with mean or median ages from 70 to 79 years old. Three studies had patient populations with means or medians between 60 and 69 years old155,158,159 and one160 had a population under 60 years of age.
The percentage of males enrolled in each study ranged from 25 percent153 to 100 percent158 (mean=51.2%; median=50%). Sample size populations ranged from 53152 to 1,032158 (mean=346.8; median=357). The prevalence of HF in the study populations ranged from seven percent158 to 67 percent148 (mean=41.5 %; median=38.5%).
The majority of papers (n=9) were independent studies, publishing results on unique data sets. One article used data from the study for the evaluation of the clinical applicability of BNP in the diagnosis and management of patients with suspected HF in primary care (PANAMA),153 one reported results from the Utrecht Heart Failure Organization - Initial Assessment (UHFO-IA) study154 and one study recruited patients from the Screening to Prevent Heart Failure (STOP-HF) study.155
Ten studies used the TRIAGE-B-Type Natriuretic Peptide (BNP) test,148-153,155-157,159 one used the ADVIA-Centaur® B-Type Natriuretic Peptide (BNP) Assay,158 and one used the Abbott AxSYM® B-Type Natriuretic Peptide (BNP) Microparticle Enzyme Immunoassay (MEIA).154
Diagnosis of Heart Failure
Most studies (n=8) based the diagnostic reference standard solely on clinical judgment.148-151,154,157-159 Most of these had a reference standard agreed upon by at least two physicians (mostly cardiologists), with the exception of two papers, which based the final diagnosis on the opinion of a single cardiologist or other type of clinician.157,159 The adjudication physicians each arrived at a diagnosis of HF based on their interpretation of all available clinical data; this often included echocardiography results. Four of the studies148,149,151,153 judging final diagnosis using clinical judgment stated that the Framingham criteria were used to assist in judgment.
Of the remaining studies, two based final diagnosis of HF on echocardiography results alone,152 and one simply reported that the diagnosis was “based on the Framingham criteria.”153 One study did not report the reference standard used.155
BNP: Test Performance and Optimal Cutpoints in Primary Care
Diagnostic Properties in BNP
The 12 studies evaluating BNP in primary care settings used several cutpoints ranging from 30148,157 to 500148 (mean=158; median=100) pg/mL or ng/L, and reported sensitivities from 25 percent153 to 97 percent148 (mean=82.1%; median=83.9%), specificities from 23 percent151 to 92 percent148 (mean=73.8%; median=80.4%), and AUCs of 0.62159 to 0.93158 (mean=0.86; median=0.88). Six studies examined BNP only154-159 and six focused on both BNP and NT-proBNP.148-153 See Appendix I Tables I-1 and I-2.
When the appropriate data were available for extraction or calculation, 2×2 tables were prepared and forest plots of sensitivities, specificities, positive and negative LRs, logDOR, and summary ROC curves are presented (Appendix I Figures I-1 to I-9). Three cutpoints were selected: lowest presented, manufacturers' suggested, and the optimal cutpoint as chosen by the authors.
The pooled sensitivity using the optimum cutpoint was 0.82 (95% CI, 0.69 to 0.90). All but a single study by Barrios et al.153 which had a sensitivity of 0.25, had specificities greater than 0.80. The low sensitivity of the Barrios study may be due to a predominantly elderly population and high prevalence of diastolic HF. Pooled specificities were, as expected, not as high and gave an overall specificity of 0.64 (95% CI, 0.45 to 0.79). Summary LR+ and LR- and the logDOR were 2.27 (95% CI, 1.43 to 3.62), 0.28 (95% CI, 0.16 to 0.49), and 2.06 (95% CI, 1.27 to 2.84), respectively. Pooling using the lowest cutpoint produced a slightly higher sensitivity of 0.89 (95% CI, 0.77 to 0.95) and a corresponding lower specificity of 0.54 (95% CI, 0.41 to 0.66). The LR+ and LR- and logDOR gave similar results: 1.94 (95% CI, 1.47 to 2.57), 0.20 (95% CI, 0.09 to 0.44), and 2.27 (95% CI, 1.32 to 3.22), respectively.
Studies were pooled based on the manufacturers' suggested cutpoint because this is likely the most commonly used cutpoint in clinical use. Studies were included if the cutpoint used was within 5 pg/mL of 100. Eight studies were included in the pooled statistics, as they all used the Triage BNP assay. Other manufacturers were not included. The overall sensitivity of 0.76 (95% CI, 0.59 to 0.87) based on the manufacturers' cutpoint was slightly lower than that for the optimal cutpoint. Corresponding specificity was increased slightly to 0.71 (95% CI, 0.52 to 0.85). The LR+ and LR- and logDOR gave results similar to the optimal cutpoint, 2.63 (95% CI, 1.59 to 4.36), 0.34 (95% CI, 0.20 to 0.57), and 2.08 (95% CI, 1.24 to 2.92), respectively.
Summary ROC curves were also developed. As with the summary plots, the ROC curves were developed based on the optimum, lowest, and manufacturers' cutpoints and are presented in Appendix I Figures I-10 to I-12. The AUCs were 0.81 (95% CI, 0.77 to 0.84) for the optimum cutpoint, 0.76 (95% CI, 0.72 to 0.80) for the lowest cutpoint, and 0.80 (95% CI, 0.76 to 0.83) for the manufacturers' suggested cutpoint.
BNP: Determinants of Test Performance in Primary Care
The effect of various determinants upon the diagnostic performance of BNP for the diagnosis of HF was examined.
A single study examined the association of age with BNP. Park et al.158 compared the performance of BNP for patients above and below 65 years of age for the identification of LVEF or advanced diastolic dysfunction (DD). For patients 65 years of age and greater, using a cutpoint of 250 pg/mL, the AUC was 0.903 (sensitivity=83.9, specificity=83.7). For identification of advanced DD and a cutpoint of 236 pg/mL, the AUC was 0.900 (sensitivity=83.9, specificity=84.1). For patients less than 65 years old with LVEF less than 45, cutpoint of 82 pg/mL was used, which gave an AUC of 0.916 (sensitivity=84.1, specificity=84.2). A cut-off of 70 pg/mL was used to identify advanced DD with an AUC of 0.912 (sensitivity=83.3, specificity=83.3).
Two studies investigated the relationship between sex and BNP.156,158 Fuat et al.156 compared the AUC of males and females and did not find a significant difference (males 0.79, females 0.80). Park et al.158 compared the ability of BNP to identify male and female patients with LVEF less than 45 and advanced DD. The results of Park et al. are presented in Table 13.
Body Mass Index
Two studies examined the relationship between BNP and BMI.158,159 Christenson et al.159 grouped patients as normal (BMI <25 kg/m2), overweight (BMI 25 to 30 kg/m2), or obese (BMI >30 kg/m2), and demonstrated an inverse correlation of BNP with BMI. The AUC for diagnosis of decompensated HF in the three groups (<25kg/m2, 25-30kg/m2, and >30 kg/m2) were 0.78 (95% CI, 0.71 to 0.084), 0.62 (95% CI, 0.54 to 0.70), and 0.72 (95% CI, 0.66 to 0.79), respectively. Using a cutpoint of 100 pg/mL, sensitivity and specificity of BNP were 89 percent and 38 percent for normal weight patients, 85 percent and 38 percent for overweight patients, and 81 percent and 49 percent for obese patients, respectively.
Park et al.158 also investigated the relation of BNP with BMI for the identification of patients with LVEF less than 45 and advanced DD. A similar inverse correlation trend was seen, more so with the advanced DD patients. Results are presented in Table 14.
Park et al.158 studied the effect of renal function on the ability of BNP to identify patients with LVEF less than 45 and advanced DD. Renal function was estimated by creatinine clearance calculated by the Cockroft-Gault equation. Patients were grouped as clearance less than 60 mL/min or greater than 60 mL/min. As can be seen, as renal function decreases the cutpoint must increase to maintain a similar sensitivity and specificity. The effect of decreased LVEF or advanced DD was overwhelmed by the effect of renal function, and had little effect on the optimal cutpoint. Results are presented in Table 15.
Sample and Design Characteristics of Studies Assessing NT-proBNP
There were 20 articles that met the criteria for KQ2 examining NT-proBNP in primary care settings. Sixteen examined NT-proBNP only154,161-175 and four examined both BNP and NT-proBNP.156-159 (Appendix I Table I-3).
Two studies used a prospective cohort design.169,171 Study design could not be determined in one of the articles.174 The remaining studies (n=17) used a cross-sectional design. The selected articles were published between 2003 and 2011 and were conducted in a wide range of regions: one in North America,159 18 in Europe,154,156,157,161-175 and one in Asia.158
Most studies, with the exception of five,154,158,162,163,165 provided diagnostic information on the overall study sample presenting with dyspnea in a primary care setting. Some studies provided diagnostic information on populations grouped according to age158,165,170 and sex.156,158,162,166,170 Some studies presented diagnostic information according to BMI status,158,159 diabetes status,161 previous history of HF,161 LVEF,163 renal failure,158 and hemoglobin (Hb) measures.158 One study presented groups according to their suspected HF/valvular disease (LVSD),167 and one study grouped subjects according to diagnosis of major structural heart disease in patients with AF) compared with those with sinus rhythm (SR).165
In all studies, study patients presented to a primary care facility with shortness of breath and were over 18 years of age. Seven studies had a patient population with mean or median ages from 60 to 69 years old.158,159,161-163,168,172 Eleven had populations with mean or median ages between 70 and 79 years.154,156,157,164-166,170,171,173-175 Two examined populations 80 years of age and over.167,169
The percentage of males enrolled in each study ranged from 32.1 percent170 to 100 percent158 (mean=42.8%; median=46%). Sample size populations ranged from 14163 to 1,321165 (mean=239; median=140). The prevalence of HF in the study populations ranged from 4 percent168 to 75 percent173 (mean=31.2%; median=33.1%).
Most of the papers (n=17) were independent studies, publishing results on unique data sets. One study used data from the Echocardiographic Heart of England screening study (ECHOES),161 one reported results from the Diagnostic Trial on Prevalence and Clinical Course of Diastolic Dysfunction and Diastolic Heart Failure (DIAST-CHF),174 and one used results from the UHFO-IA.154
All studies (n=20) used the ELECSYS® proBNP Immunoassay to measure NT-proBNP.
Diagnosis of Heart Failure
The majority of studies (n=11) based the diagnostic reference standard solely on clinical judgment. Less than half of these had a reference standard agreed upon by at least two physicians154,158 (mostly cardiologists), with eight studies basing the final diagnosis on the opinion of a single physician.157,159,167,168,170-173 The adjudication physicians each arrived at a diagnosis of HF based on their interpretation of all available clinical data; this often included echocardiography results. One of the studies used the Framingham criteria to aid in clinical judgment.174
Of the remaining studies, four based the final diagnosis of HF both on clinical judgment and results of echocardiography,156,161,162,164,166 one based it on echocardiography results alone,163,165 and one simply reported that the definitive diagnosis was “based on the Framingham criteria.”169 One study used an outcome panel that evaluated all available information, excluding the NT-proBNP results.175
NT-proBNP: Test Performance and Optimal Cutpoints in Primary Care
The 20 studies evaluating NT-proBNP in primary care settings used several cutpoints ranging from 25171 to 6180167 (mean=635; median=379) pg/mL or ng/L. Three studies161,162,164 measured NT-proBNP in pmol/L. Reported sensitivities ranged from 44 percent167 to 100 percent164-166,169 (mean=80.6%; median=84.4 %), specificities from 3 percent165 to 97 percent,163,168 (mean=58.5% ; median=60.6%), and AUC of 0.70161 to 0.98166 (mean=0.86; median=0.88). The majority of the studies focused on NT-proBNP alone (n=14), and the remainder focused on both BNP and NT-proBNP.154,156-159 Appendix I Table I-4 presents data to answer KQ2.
When the appropriate data was available for extraction or calculation, 2×2 tables were prepared and forest plots of sensitivities, specificities, positive and negative LRs, DOR, and summary ROC curves are presented (Appendix I Figures I-13 to I-18).Three cutpoints were selected: lowest presented, the optimal cutpoint as chosen by the authors to examine in greater detail, and the manufacturers' recommended cutpoint of 125 pg/mL for patients younger than 75 years of age and 450 pg/mL for those patients 75 years of age or older. At least four studies were needed in each group to present summary estimates; however, for NT-proBNP according to manufacturers' cutpoint, only two studies satisfied our criteria and, thus, will not be presented.
When the optimal cutpoint chosen by the authors was used, the pooled sensitivity was 0.88 (95% CI, 0.81 to 0.93) and seven of the studies156,164,166-168,170,172 produced sensitivities greater than 0.90. A single study by Stahrenberg et al.174 had a significantly lower sensitivity of 0.55 (95% CI, 0.44 to 0.65) due to a relatively high cutpoint of 22 pg/mL; however, they did produce a relatively good specificity 0.61 (95% CI, 0.47 to 0.74). The pooled specificity (0.58) was, as expected, not as high as the pooled sensitivity, as the authors tend to optimize sensitivity.
Using the lowest cutpoint chosen by the authors produced increased pooled sensitivity (0.90) when compared to the optimal cutpoint (0.88), with no decrease in pooled specificity (0.50). All but three studies159,171,174 produced sensitivities greater than 0.90.
As with the summary plots, the ROC curves were developed based on the optimum and lowest cutpoints. The AUC were 0.86 (95% CI, 0.82 to 0.88) for the optimum cutpoint, and 0.82 (95% CI, 0.79 to 0.85) for the lowest cutpoint (Appendix I Figures I-19 to I-20).
NT-proBNP: Determinants of Test Performance in Primary Care
We examined the effect of various determinants on the diagnostic performance of NT-proBNP for the diagnosis of HF.
Two studies investigated the influence of age on the diagnostic ability of NT-proBNP.158,165 In both cases the optimal cutpoint for identification of major structural heart disease (defined as LVEF <40, left ventricular DD, or right ventricular dilation) was higher in older patients. Shelton et al.165 compared patients above and below the age of 75 years. They also compared the difference between patients in SR and those in AF. Park et al.158 compared the performance of BNP for patients above and below 65 years of age for the identification of LVEF or advanced DD. Table 16 provides a summary of this data.
Five studies investigated the relationship between sex and the ability of NT-proBNP to diagnose HF.156,158,162,166,170 Using a regression model, Mikkelsen et al.166 identified sex as a significant influence on NT-proBNP. The AUC for the diagnosis of HF in females was 0.97 (95% CI, 0.95 to 1.00) and 0.91 (95% CI, 0.83 to 0.98) for males. Due to the sex differences, the optimal cutpoints were different between males and females: 85 pg/mL and 110 pg/mL, respectively.
Nielsen et al.162 examined the ability of NT-proBNP to identify HF in men and women 50 years of age and above, as the prevalence of HF in those less than 50 years of age was very low. ROC curves for men gave an AUC of 0.93 (95% CI, 0.89 to 0.97) for men and an AUC of 0.90 (95% CI, 0.84 to 0.97) for women. Using a NPV of 97 percent, they suggest a cutpoint of 11 pmol/L for men and 17 pmol/L for women.
Fuat et al.156 compared the ability of NT-proBNP to rule out the presence of HF in men and women. They maximized sensitivity without producing an unacceptable loss of specificity. The AUC for men was 0.79, and using a cutpoint of 100 pg/mL produced a NPV of 0.89 (95% CI, 0.74 to 1.00). Women had a slightly higher AUC of 0.82, and using a cutpoint of 150 pg/mL produced a NPV of 0.94 (95% CI, 0.88 to 1.00).
Linear regression analysis performed by Olofsson and Bowman170 showed no significant difference in diagnosis of HF between males and females, while multiple linear regression showed that age and male sex was significantly associated with higher levels of NT-proBNP.
Park et al.158 compared the ability of NT-proBNP to identify male and female patients with LVEF less than 45 and advanced DD. Data for multiple cutpoints and results of papers that used sensitivity and specificity as an outcome are shown in Table 17. Fuat et al.156 maximized sensitivity, then specificity, and reported an outcome of NPV. This study is therefore not presented in Table 17.
Body Mass Index
Two studies examined the relationship between NT-proBNP and BMI.158,159 In a relatively large study of 685 patients, Christenson et al.159 grouped patients as normal (BMI <25 kg/m2), overweight (BMI 25 to 30 kg/m2), or obese (BMI >30 kg/m2), and demonstrated an inverse correlation of NT-proBNP with BMI. The AUCs for a diagnosis of decompensated HF in the three groups (normal, overweight, and obese) were 0.77 (95% CI, 0.70 to 0.084), 0.64 (95% CI, 0.56 to 0.72), and 0.71 (95% CI, 0.65 to 0.77), respectively. Using the International Collaborative of NT-proBNP study cutpoints2 of 450 pg/mL for under 50 years of age, 900 pg/mL for ages 50 to 75, and 1,800 pg/mL for ages over 75, sensitivity and specificity of BNP were 88 percent and 50 percent for normal weight patients, 68 percent and 51 percent for overweight patients, and 69 percent and 64 percent for obese patients, respectively.
Park et al.158 also studied the effect of renal function on the ability of NT-proBNP to identify patients with LVEF less than 45 and advanced DD. Renal function was estimated by creatinine clearance calculated by the Cockroft-Gault equation. Patients were grouped as clearance less than 60 mL/min or clearance of 60 mL/min or over. Using multivariate regression analysis, clearance less than 60 ml/min was shown to be an independent determinant of NT-proBNP. The AUC, sensitivity, and specificity results are presented in Table 19.
Assessment of Quality for Studies With Primary Care Settings
For studies of diagnostic tests (KQ2), we used the QUADAS-2 to assess quality in four key domains: patient selection, index test(s), reference standard, and flow and timing. The questions in each domain are rated in terms of risk of bias (low, high, unclear) and concerns regarding applicability (low, high, unclear), with associated signaling questions to help with bias and applicability judgments (Figures 7 and 8, and Appendix I Table I-5).
The potential for bias in the domain of patient selection was assessed on the basis of the enrollment of the study sample (consecutive, random, or convenience), the avoidance of a case-control design, and the avoidance of inappropriate patient exclusions. For this domain, 42 percent of studies (n=5) were rated as low risk for bias and 58 percent (n=7) were rated as unclear as to risk of bias. Studies were assessed as to patient population applicability to those targeted by the review question in terms of severity of the target condition, demographic features, presence of differential diagnosis or comorbid conditions, and setting of the study. Overall, 83 percent (n=10) of studies were assessed as high, 8 percent (n=1) as low, and 8 percent (n=1) as unclear for concern regarding applicability on this domain.
The potential for bias in the domain of the index test was assessed according to whether results were interpreted without knowledge of the results of the reference standard and whether a prespecified threshold was used for BNP cutpoints. Twenty-five percent (n=3) of studies were rated as low risk on this domain, 33 percent (n=4) were rated as high, and 42 percent (n=5) were rated as unclear. Studies were assessed on concerns of applicability on the basis of whether the index test methods varied from those specified in the review questions. Concerns about applicability on this domain were assessed as low for 67 percent (n=8) of studies, as high for 25 percent (n=3), and as unclear for 8 percent (n=1).
The potential for bias in the domain of the reference standard (i.e., the criteria used to confirm a diagnosis of HF) was judged on the basis of whether the reference standard was likely to correctly classify the target condition and whether the results were interpreted with knowledge of the BNP results. Studies were rated as low risk for 67 percent (n=8) of articles, high for 25 percent (n=3), and as unclear by 8 percent (n=1). Concerns about applicability were assessed as to whether the target condition, as defined by the reference standard, differed from the target condition specified in the review question. Sixty seven percent (n=8) of studies were assessed as low, 25 percent (n=3) were assessed as high, and 8 percent (n=1) were unclear on this domain.
The potential for bias in the domain of flow and timing was assessed on the basis of inappropriate intervals between index test and reference standard, standardized administration of reference standard among patients, and equal inclusion of patients in the analysis. Eighty three percent (n=11) of studies were assessed as low risk and eight percent (n=1) were unclear as to bias for this domain.
For studies of diagnostic tests (KQ2), the QUADAS-2 used to assess quality in four key domains: patient selection, index test(s), reference standard, was and flow and timing. The questions in each domain are rated in terms of risk of bias (low, high, unclear) and concerns regarding applicability (low, high, unclear), with associated signaling questions to help with bias and applicability judgments (see Figures 9 and 10, and Appendix I Table I-6).
The potential for bias in the domain of patient selection was assessed on the basis of the enrollment of the study sample (consecutive, random, or convenience), the avoidance of a case-control design, and the avoidance of inappropriate patient exclusions. For this domain, 40 percent of studies (n=8) were rated as low risk for bias and 5 percent (n=1) were rated as high risk. The remaining studies (n=11; 55%) were rated as unclear as to risk of bias. Studies were assessed as to patient population, applicability to those targeted by the review question in terms of severity of the target condition, demographic features, presence of differential diagnosis or comorbid conditions, and setting of the study. Overall, 65 percent (n=13) of studies were assessed as high for concerns about applicability on this domain, 20 percent (n=4) were rated as low, and the remainder (n=3; 15%) were rated as unclear on concern regarding applicability on this domain.
The potential for bias in the domain of the index test was assessed according to whether results were interpreted without knowledge of the results of the reference standard and whether a prespecified threshold was used for NT-proBNP cutpoints. Forty-five percent (n=9) of studies were rated as low risk and 35 percent were rated as high risk (n=7) and 20 percent (n=4) were deemed unclear on this domain. Studies were assessed on concerns of applicability on the basis of whether the index test methods varied from those specified in the review questions. Concerns about applicability on this domain were assessed as low for 70 percent (n=14) of studies and as high for 30 percent (n=6).
The potential for bias in the domain of the reference standard (i.e., the criteria used to confirm a diagnosis of HF) was judged on the basis of whether the reference standard was likely to correctly classify the target condition and whether the results were interpreted with knowledge of the NT-proBNP results. Seventy percent of studies (n=14) were rated as low risk, 10 percent (n=2) were rated as high, and 20 percent (n=4) were rated as unclear on this domain. Concerns about applicability were assessed as to whether the target condition, as defined by the reference standard, differed from the target condition specified in the review question. Sixty-five percent (n=13) of studies were assessed as low and 35 percent (n=7) were assessed as high on this domain.
The potential for bias in the domain of flow and timing was assessed on the basis of inappropriate intervals between index test and reference standard, standardized administration of reference standard among patients, and equal inclusion of patients in the analysis. Ninety percent (n=18) of studies were assessed as low risk of bias and 10 percent (n=2) were assessed as unclear on the domain of flow and timing.
Strength of Evidence for Studies With Primary Care Settings
Two primary outcomes were chosen to be assessed: sensitivity and specificity. For all studies that presented sensitivity and specificity data (BNP n=11;148-154,156-159 NT-proBNP n=17156-159,161-170,172-174), the SOE was examined using a variety of cutpoints. For BNP the lowest cutpoint provided, the manufacturers' suggested, and the optimal cutpoint identified by the author were used. For NT-proBNP we used the lowest and optimal cutpoints.
Risk of Bias
Using the QUADAS-2 tool, the risk of bias was rated for both sensitivity and specificity (Figures 7 to 10). The tests for publication bias exposed no significant bias in the following conditions in our meta-analysis of BNP and NT-proBNP diagnostic use in primary care: (1) optimum cutpoint, (2) lowest cutpoint, and (3) manufacturers cutpoint (see Appendix I Table I-9 and Figure I-21). In the domains of reference standard and flow and timing, the majority of the studies showed a low risk of bias. In terms of patient selection, 58 percent of the studies had an unclear risk of bias. The domain of index test, 33 percent of the studies, had a high risk of bias. Despite the potential high risk of bias in the index test, the overall risk of bias was rated low.
The question of diagnostic accuracy is asked in KQ2 and sensitivity and specificity in a primary care population are being assessed. This domain was rated as direct, as these are concepts that are generally understood by clinicians and can be applied directly to diagnosis of HF in a similar clinical setting.
For both BNP and NT-proBNP, the CIs around the summary estimates for sensitivity and specificity for BNP and NT-proBNP are not precise. This domain was rated as imprecise (Table 20).
In terms of BNP sensitivity, the directions of the estimates are consistent, and with the exception of a single study,153 are very similar. In terms of NT-proBNP sensitivity, because the directions of the estimates are consistent and the CIs are small, this domain was rated as consistent for both BNP and NT-proBNP. However, the specificity was rated as inconsistent because the range of estimates across studies for both BNP and NT-proBNP are large (Table 20).
Key Question 3. In HF populations, is BNP or NT-proBNP measured at admission, discharge, or change between admission and discharge, an independent predictor of morbidity and mortality outcomes?
Interpretation of the results from prognostic studies may require some caution with respect to comparison across studies. Establishing the prognostic value of a marker within a single study requires consideration of the type of statistical computational methods (e.g., cox regression), the manner in which the BNP/NT-proBNP is operationalized within these computations (e.g., continuous, dichotomous, log-transformed), the number and types of covariates included as explanatory variables, and the threshold/cutpoint used to consider high and low risk groups within categorical analyses. Thus, the magnitude of a hazard ratio (HR) in one study is not comparable to that in another study when any of the features detailed above are different. Where provided within the text of eligible studies, aspects of the statistical model/computations are reported (e.g., the type and number of covariates, how BNP/NT-proBNP was operationalized within the statistical model, any applicable cutpoints). See Appendix J KQ3 Evidence Set.
BNP Levels in Decompensated Heart Failure Patients Using BNP and Prognosis
Characteristics of Studies in Decompensated Heart Failure Patients Using BNP Levels
The prognostic ability of BNP among patients with decompensated HF was assessed in 38 publications that dealt specifically with BNP.106,176-212 A further six publications evaluated both BNP and NT-proBNP in this population.3,213-217 One study218 reported only multivariable correlation coefficient with BNP levels and the outcome of length of stay and as such is not suitable for prediction of outcomes. In total, 44 publications are presented for evaluating the predictive contribution of BNP levels in decompensated HF patients.
One article was an RCT examining outcomes in participants randomized to regular BNP measurements versus no regular BNP measurement.194 Two articles were secondary analyses of data initially collected in RCTs; however, the secondary analyses did not account for the groups to which participants were randomized.210,211 One191 used a non-randomized controlled design, and six were retrospective193,196,197,201,203,216 cohort studies. It was unclear in one article as to what study design was used. The remaining (n=33) used a prospective cohort design. The selected articles were published between 2004 and 2012 and were conducted world-wide including: nine in North America,176,179,185,189,193,197,207,212,213 28 in Europe,106,177,178,180,182-184,186-188,190-192,195,198-206,208,209,211,215,219 and one in Asia.181 Five studies were conducted in multinational sites.3,196,210,216,217
Several publications reported on the same cohorts, including subjects from the Rapid Emergency Department Heart Failure Trial (REDHOT) study,176 REDHOT II,194 and from an Austrian HF specialty clinic.182,204 Another study216 included the subjects from the Austrian HF clinic with subjects from the PRIDE study.213,216 Several other included papers were based on large study cohorts including: one from the Survival of Patients With Acute Heart Failure in Need of Intravenous Inotropic Support (SURVIVE) trial,196 one from the Efficacy of Vasopressin Antagonism in Heart Failure Outcome Study with Tolvaptan (EVEREST) study,210 two from the Coordinating study evaluating Outcomes of Advising and Counseling in Heart failure (COACH) trial,211,220 two from the BASEL188,220 study, and two from BACH.3,221 Additionally, there were several publications that derived data from cumulative patient registries that overlapped in time (subsets of same patient pool) from the cardiology departments of Valencia, Spain198,205 and Cuneo, Italy178,184,199,201,203 acute care hospitals.
Risk of Bias
The risk of bias was assessed based on the Hayden criteria58 as described in the methods section (Appendix E) and results across studies are seen in Figure 11 (see also Appendix J Table J-1 for individual study ratings).
For the studies including patients with decompensated HF and evaluating the predictive strength of BNP levels, there is low risk of bias for population description and selection, attrition, description of statistical analysis, and for how prognostic factors were addressed, with the exception that most studies did not provide reasons for indeterminate test results or missing data (item 3e).
Although, the outcome measurement was adequately defined in most studies, the majority of publications did not adequately measure the outcome (item 4b), and many studies reported data for composite outcomes only (item 4c). The risk of bias is high for the BNP studies in decompensated patients with respect to adequate measurement of outcomes and avoiding composite outcomes.
Confounding was particularly poorly addressed in this group of studies. Based on the a priori criteria, studies were assessed for selection of important confounders such as age, sex, BMI, and renal function as important covariates within the prognostic model. Within these 44 publications, only 43 percent of studies met criteria for measuring confounders or accounting for them in the design or analysis (items 5a, 5b). The risk of bias is high for confounding and most studies omitted at least one of the key confounders (BMI in particular).
Most of the study designs were observational cohorts (prospective) and the majority of studies established research questions specifically to assess BNP levels. However, some studies evaluated other cardiac markers and the focus of the research and the development of the prognostic models included evaluation of the BNP but was not primarily focused on BNP.
In summary, the overall risk of bias in studies evaluating BNP levels as a predictor of outcome in decompensated patients for HF, was rated as moderate because of concerns with adequacy of outcome measurement, use of composite outcomes only, and problems with identification and adjustment for key confounders.
All tables showing the prognostic studies can be found in Appendix J.
BNP Levels Predicting Risk for All-Cause Mortality
Admission, Discharge, and Change in BNP Levels and Prognosis Up to 31 Days
Five studies183,194,196,215,217 assessed admission BNP levels and attempted to evaluate all-cause mortality up to 30 or 31 days (Appendix J Table J-2). Two studies recruited subjects from emergency settings. One study217 reported that admission BNP levels were independent predictors of 14 day mortality. The REDHOT II study194 recruited subjects with BNP levels greater than 100 pg/mL; patients were randomized to having serial BNP measurements (admission, 3, 6, 9, and 12 hours post admission) that were communicated to the physician; the control group did not have serial measurement and assessment of BNP was at the discretion of the physician. The findings from the REDHOT II study suggest that knowledge of serial BNP measurements has a protective effect with respect to predicting 30 day mortality but this was not statistically significant. This study could also be classified as one assessing the impact of the use of BNP to guide treatment.
Three studies enrolling subjects admitted to hospital183,196,215 attempted to evaluate the association between baseline BNP and subsequent 30 day mortality. Two studies evaluated serial measurements of BNP, including admission, 24 hours,196,215 48 hours,215 and at days three and five.196 Neither study reported the predictive strength of admission BNP levels and subsequent mortality. Both these studies would suggest that serial measurements at 24 and 48 hours are significant predictors of 30 day mortality. One study196 showed that change from baseline (reduction in BNP levels) was protective with respect to 30 day mortality. A single study183 that was at high risk of bias evaluated patients admitted to an acute care center with BNP levels >100 pg/mL but reported no results from the logistic regression specific to BNP.
Admission, Discharge, and Change in BNP Levels and Prognosis From 2 to 3 Months
Four studies3,176,214,217 attempted to evaluate the predictive strength of BNP levels and all-cause mortality at 3 months (Appendix J Table J-3). All but one study recruited subjects from the emergency setting.214 Two publications3,217 evaluated the subjects from the BACH study but differed in the number of subjects with final adjudication of acute HF; both BACH publications showed admission BNP levels to be independent predictors of 90 day mortality. One of these publications3 showed admission BNP to be an independent predictor when considered as both a categorical, continuous, and log transformed variable in a simple statistical model (age, sex, BMI, creatinine) but not in a more complex model. The REDHOT trial,176 showed that knowledge of serial BNP levels (admission, 3, 6,9, and 12 hour) was an independent predictor of 90 day all-cause mortality. A single study214 recruited subjects admitted to hospital, evaluated a 10 percent change (decrease) relative to admission BNP levels and showed that this change in BNP levels was not a statistically significant predictor of 90 day mortality.
Admission, Discharge, and Change in BNP Levels and Prognosis at 6 to 11 Months
Five publications196,198,200,205,210 evaluated BNP levels and prediction of all-cause mortality from 6 to 11 months (Appendix J Table J-4). Two publications198,205 had overlapping samples recruited from the same hospital center. One of these publications205 used log transformed BNP and showed it to be an independent predictor. The companion article198 used admission BNP levels and showed a dose response effect; with increasing thresholds (quintiles) of BNP levels, the HR increased (from HR=2.75 (95% CI, 1.17 to 6.46) to HR=5.82 (95% CI, 2.62 to 12.97)). There was some concern with outcome measurement and the adjustment of confounders in these companion papers, suggesting the potential for increased risk of bias in these two publications. Another study210 recruiting subjects form emergency settings and evaluating admission BNP levels showed that higher levels of BNP increased the HR for 6 month mortality (HR=1.84 (95% CI, 1.25 to 2.71) to (HR=3.22 (95% CI, 2.27 to 4.55)).
The two remaining studies evaluated change in BNP levels196 and discharge BNP levels200 as predictors of all-cause mortality. In one study,196 a decrease of BNP levels greater than 30 percent relative to admission (or <800 pg/mL) showed a protective effect from mortality. In the second study,222 combining subjects who had discharge BNP levels greater than or equal to 360 pg/mL and a decrease of less than 50 percent, or increase (Group 3 vs. 1) showed the highest HR (Appendix J Table J-4).
Admission, Discharge, and Change in BNP Levels and Prognosis at 12 to 23 Months
There were seven publications that evaluated admission BNP levels from the BASEL cohort,106,188 a German study (overlapping samples),182,204 the PRIDE study,213,216 and an independent study193 for predicting 12 month all-cause mortality. Two studies211,215 evaluated change or discharge levels of BNP.
All but two studies193,211 recruited patients from emergency settings. All but one study215 recruited subjects from emergency settings and evaluated admission BNP levels as predictors. One additional193 study evaluated admission BNP levels but recruited subjects admitted to hospital but with a mixed population with 29.7 percent of subjects recruited from the community The seven publications106,182,188,204,213,215,216 that recruited patients from emergency settings, were generally at low risk of bias, with the exception of some concerns regarding verification or validity of the outcomes and potential confounding. One study with two publications,182,204 undertook different model computations on the same dataset. (Appendix J Table J-5) shows the differences in the estimate of the HR varying from HR=2.45 (95% CI, 1.29 to 4.65) to HR=3.34 (95% CI, 1.61 to 6.97). Similarly, two studies from the PRIDE cohort216 and the Boston site of the PRIDE cohort,213 showed that admission BNP levels were independent predictors of all-cause mortality (HR=2.12 [95% CI, 1.37 to 3.27] and HR=2.53 [95% CI, 1.53 to 6.21]) at 12 months.
Two publications106,223 based on subjects from the BASEL study, modeled admission BNP levels as a dichotomous and continuous variable, and both were independent predictors of 12 month mortality. The final study evaluating admission BNP levels also showed that BNP was an independent predictor of mortality at 12 months.193
Two studies did not assess the prognostic value of admission BNP levels assessed but serial measurements215 and discharge BNP levels.211,215 The first study215 showed that 24 and 48 hour and discharge BNP levels were all significant independent predictors of 12 month mortality. The second study211 had a primary aim to evaluate the prognostic merit of Type D personality type (distressed) as a predictor of mortality but did not find this factor (or symptoms of depression) to be significant; rather, discharge BNP was shown to be an independent predictor at 18 months.
Admission, Discharge, and Change in BNP Levels and Prognosis at 24 Months and Greater
There were three studies,179,192,208 that evaluated prognosis at 24 months (Appendix J Table J-6). The single study208 evaluating admission BNP levels as a predictor of 24 month all-cause mortality had a primary objective to compare the value of human growth factor as a predictor; BNP was the reference biomarker, and was shown to be a significant predictor. A second study192 compared admission and discharge BNP levels and both were shown to be independent predictors at 24 months. The final study179 evaluating prediction of 24 month all-cause mortality evaluated discharge BNP levels and this was not statistically significant.
BNP Levels Predicting Cardiovascular Mortality
Five studies evaluated the prognostic value of admission BNP levels and cardiovascular mortality from 31 days,187,209 6 months,205 12 months,181 and 24 months.208 Two studies3,214 measured cardiovascular mortality at 90 days but did not report data evaluating the predictive value of admission BNP (Appendix J Table J-7).
Two studies187,209 at low risk of bias (except for potential measurement of confounding) evaluated admission BNP levels and prognostic value at 31 days for cardiovascular mortality. These studies included similar patient populations (older patients with severe HF) and cutpoints; their findings suggest that admission BNP is an independent predictor adding incremental prognostic value187 and showing increasing odds (for log transformed BNP) of cardiovascular mortality.
One study205 that evaluated cardiovascular mortality at 6 months showed that the log transformed admission BNP was an independent predictor (HR=1.48 (95% CI, 1.24 to1.77)). This same study reported similar values for HF mortality (HR=1.47 (95% CI, 1.19 to 1.81)).
Two studies181,208 evaluated cardiovascular mortality for longer term followup (12 and 24 months), and one181 reported the prognostic strength of admission BNP (odds ratio (OR)=1.21 (95% CI, 1.06 to 2.32)) and the other indicating that admission BNP levels was a significant independent predictor.208
BNP Levels Predicting Morbidity Outcomes
Four studies179,192,194,210 reported on morbidity outcomes using admission and discharge BNP for followup periods of 1,194 6,210 and 24 months.179,192 A single study194 evaluated serial BNP levels for predicting 6 month cardiovascular morbidity (readmission) and their findings suggest that knowledge of BNP values had a protective effect (Appendix J Table J-8).
Two other studies179,192 evaluated cardiovascular readmission outcomes but evaluated discharge BNP levels as the prognostic indicator at 24 months; one study showed that discharge BNP levels was an independent predictor179 and the other192 showed that it was not significant but this paper was suspect with respect to the selection and adjustment of confounders. One other paper210 used discharge BNP levels to predict unfavorable quality of life (QOL) or hospitalization at 6 months and showed that BNP was a significant predictor only for the hospitalization outcome at both thresholds for BNP levels.
BNP Levels Predicting Composite Outcomes
All-Cause Mortality and All-Cause Morbidity
Two studies evaluated the composite outcome of all-cause mortality and all-cause morbidity at 3 months186 and 6 months.210 One study186 evaluated discharge and change from admission in isolation or in combination to predict a composite outcome; when combining change less than 46 percent and BNP greater than 300 pg/mL at discharge, the greatest risk (OR=9.61 (95% CI, 4.51 to 20.47), p<0.001) was observed. The second study210 also used discharge BNP levels and found it to be an independent predictor. (Appendix J Table J-9)
All-Cause Mortality and Cardiovascular Morbidity
Fourteen publications176-179,190,191,195,197,199,201,203,206,207,212 evaluated the composite outcome of all-cause mortality and cardiovascular morbidity. Two studies evaluated this outcome at 1 month where one study212 showed that admission BNP levels and the other206 discharge BNP levels both were independent predictors. Similarly, two studies evaluated prediction at 3 months and one176 showed that admission BNP levels were significant; however, the second study207 showed that BNP was not a significant predictor when selecting a dichotomous predictor (threshold 360 pg/mL) but was statistically significant when placed in the prognostic model as a continuous variable (Appendix J Table J-10).
Five publications178,190,199,201,203 evaluated overlapping patient populations from related clinics in Italy, and all used discharge BNP levels as the prognostic indicator which was consistently shown to be an independent predictor at 6 months. Two other studies evaluated composite outcome at 6 months. One study191 showed only change from baseline (less than 58 percent) to be a significant predictor and admission BNP levels were not. The second study177 evaluated discharge BNP levels as predictors in the study sample but also in a validation cohort; discharge BNP levels were predictive of this composite outcome, but the risk was significantly increased in the validation sample.
Three remaining studies evaluated BNP levels as predictors of longer term composite outcome at 12 months,195 392 days,197 and 24 months.179 One study195 evaluated admission BNP levels as a predictor in patients with depression and showed that it was a significant predictor (HR=1.002, p=0.001). The remaining two studies evaluated post admission change from baseline or discharge BNP levels as predictors. One study197 evaluated patients post admission (interval not specified) and combined data of BNP levels with some data from patients up to 30 days post discharge; their findings suggest that BNP levels measured post admission were significant predictors of 12 month composite outcome. In this group, discharge and percent change from discharge were evaluated; the latter showed a protective effect (HR=0.7 (95% CI, 0.6 to 0.9), p=0.006). The third study179 reported that adding BNP improved model performance and was a significant predictor.
Cardiovascular Mortality and Cardiovascular Morbidity
Six publications180,184,185,189,200,202,222 evaluated the composite outcome of cardiovascular mortality and cardiovascular morbidity; two publications180,200 may have overlapping samples (Appendix J Table J-11). Two studies184,202 evaluated admission BNP levels and prediction of this composite outcome at 6 months and both showed it to be an independent predictor, with increasing risk when levels were higher.184 Two related studies180,222 showed that change in BNP levels (as a decrease alone or in combination with a discharge BNP threshold) was a significant predictor at 7 months. From the two remaining studies, one publication189 showed that admission BNP was not a significant predictor, and the other185 showed that discharge BNP levels contributed to the prognostic model and was significant.
NT-proBNP Levels in Decompensated Heart Failure Patients and Prognosis
Characteristics of Studies in Decompensated Heart Failure Patients Using NT-proBNP Levels
The prognostic ability of NT-proBNP among patients admitted to hospital was assessed in 35 publications that deal specifically with NT-proBNP.1,2,224-256 A further six publications looked at both BNP and NT-proBNP.3,213-217 In total, 41 publications are discussed in this section. Study design was unclear in one paper,245 five used a retrospective cohort study design,216,232,235,240,256 and the remaining (n=35) were prospective cohort studies. The selected articles were published between 2004 and 2012 and were conducted world-wide including: four in North America,1,213,214,253 19 in Europe,215,224,226,230-233,236,237,240-242,244-247,250,251,254 three in Asia,234,235,252 one in South America,227 and one in Australia.243 Eight studies were conducted in multinational sites,2,3,216,217,225,228,238,239 and one did not report region of conduct.249
Several included papers were based on large study cohorts including: two225,228 from the ICON study, one256 from the Echo Cardiography and Heart Outcome Study (ECHOS), two3,217 from the BACH study, and two213,228 from the PRIDE study. Four studies used a combination of data sets including, ICON, PRIDE and others,2,238,239 and PRIDE and other.216 Additionally, two articles published results on companion data sets.230,248 The remaining papers were independent studies using unique data sets.
Risk of Bias
The risk of bias was assessed based on the Hayden Criteria58 as described in the methods section of this report. Figure 12 shows the proportion of studies meeting the criteria assessed for risk of bias (see Appendix J Table J-12 for individual study ratings).
For the studies including patients with decompensated HF and evaluating the predictive strength of NT-proBNP levels, there is low risk of bias for population description and selection, attrition, description of statistical analysis, and for how prognostic factors were addressed, with the exception that most studies did not provide reasons for indeterminate test results or missing data (item 3e).
Although, the outcome measurement was adequately defined in most studies, the majority of studies (66%) did not adequately measure the outcome (item 4b), and at least one third of the studies reported data for composite outcomes only (item 4c). The risk of bias is high for this group of studies with respect to adequate measurement of outcomes and avoiding composite outcomes.
Confounding was particularly poorly addressed in the studies evaluating NT-proBNP in decompensated HF patients. The a priori criteria for confounding assessed studies with respect to a minimum set of confounders that included age, sex, BMI, and renal function as important covariates. Only 41 percent of studies in this group met the criteria for measuring confounders (item 5a) and 32 percent accounted for them in the design or analysis (item 5b). The risk of bias is high for confounding (BMI in particular) in these studies.
Most of the study designs were observational cohorts (prospective) and the majority of studies established research questions specifically to assess BNP levels. However, some studies evaluated other cardiac markers and the focus of the research (and covariates in the prognostic models) was not primarily focused on BNP.
In summary, the overall risk of bias in studies evaluating BNP levels as a predictor of outcome in decompensated patients rated overall as moderate.
Study Outcomes and Followup Periods
Table 22 shows study outcomes and followup period for patients admitted to hospital for decompensated HF. Twenty-three1-3,213-217,224,225,228,234,238-240,242,243,245,247,248,250,251,256 of the 41 publications assessed all-cause mortality as a primary outcome using followup periods ranging from 2 months1,225,228 to 81 months.256 The majority of these studies, with the exception of three,214,217,247 used NT-proBNP collected at admission as a prognostic indicator for all-cause mortality. Four papers224,234,242,243 used discharge NT-proBNP and change in NT-proBNP from admission to discharge, along with admission NT-proBNP, as covariates in their models. One article247 used NT-proBNP measurements taken serially in combination with discharge, while another215 added admission NT-proBNP to serial and discharge measures. One article217 just used serial measurements of NT-proBNP. Five articles229,230,236,246,249 assessed cardiovascular mortality as an outcome, with followup periods ranging from one month249 to 15 months.246 All, but one,249 used admission NT-proBNP to predict cardiovascular mortality. Two articles230,249 used serial measurements, along with change in NT-proBNP, in their models.
All-cause morbidity was assessed in three articles,234,243,253 and cardiovascular morbidity outcomes were assessed in one.229 The remaining outcome measures consisted of composite outcomes combining various combinations including: cardiovascular mortality and cardiovascular morbidity,231,237,252,255,257 all-cause mortality and cardiovascular morbidity,1,226,227,241,254,258 all-cause mortality and all-cause morbidity,232-234,250,253,259 and cardiovascular mortality and all-cause morbidity.244 Of the articles assessing morbidity or composite outcomes, 10 used admission NT-proBNP alone as a prognostic indicator.1,234,235,237,241,250,252,254,255,260 The remaining publications used various combinations of admission, discharge, and change scores of NT-proBNP to predict morbidity and composite outcomes.
NT-proBNP Levels Predicting Risk for All-Cause Mortality
Admission and Predischarge NT-proBNP Levels and Prognosis Up to 31 Days
Two studies evaluated NT-proBNP levels and predicted all-cause mortality within 31 days post admission. One study217 evaluated admission NT-proBNP in patients admitted to the emergency department and with a final diagnosis of acute HF; findings suggested that NT-proBNP was not a significant predictor for 14 day mortality and that MR-proADM and copeptin may provide superior prediction relative to NT-proBNP. The second study evaluated 24 and 48 hour post admission and predischarge levels and assessed prediction of 30 day all-cause mortality.215 This study showed that only predischarge NT-proBNP was a significant predictor (Appendix J Table J-13).
Admission and Discharge NT-proBNP Levels and Prognosis From 2 to 3 Months
Four publications were related with respect to overlapping subjects and evaluated predictive ability for 90 day all-cause mortality; two were companion articles reporting on data from the ICON study,225,228 one was from the PRIDE study,1 and one included data from ICON and the PRIDE studies combined.2 Three of these related publications showed that admission NT-proBNP was an independent and statistically significant predictor of 60 day all-cause mortality; the study evaluating the PRIDE cohort1 showed an odds ratio (OR) of similar magnitude to the other related studies but unlike the other studies, did not show statistical significance.
Two publications evaluated subjects from the BACH study. One publication evaluated the entire BACH sample3 and showed that admission NT-proBNP was a significant independent predictor only when MDproADM and troponin were not added to the predictive model. The second study evaluated a subset of subjects who subsequently had a confirmed diagnosis of acute HF217 from the BACH study and showed that admission NT-proBNP added predictive value to the prognostic model.
A single paper214 measured admission and discharge NT-proBNP levels but reported predictive ability for a change in admission levels (decrease by 3 percent); this study showed the OR to be less than 1 (OR=0.19) suggesting a statistically significant protective effect for 90 day mortality.
Admission and Discharge NT-proBNP Levels and Prognosis From 6 to 11 Months
All-cause mortality was assessed at 6 months by five studies224,234,240,243,247 using NT-proBNP as a prognostic indicator (Appendix J Table J-15). Two related papers evaluated a subset of participants240 from a larger population224 with the New York Heart Association (NYHA) III and IV only. One240 of these companion studies evaluated the ability to predict mortality based on an analysis with extreme tertiles of admission NT-proBNP levels and showed the highest NT-proBNP levels to be the strongest predictor of death. The study with the larger sample224 evaluated change or increase of 30 percent relative to baseline and showed NT-proBNP to be a significant predictor.
One study243 compared admission and discharge NT-proBNP levels and both were independent predictors, but discharge levels were of greater magnitude (HR=3.25 vs. HR=7.05). Another study234 compared admission NT-proBNP levels at two admission thresholds (>17.86 pg/mL and <8.49 pg/mL) relative to a decrease of 35 percent from admission; both threshold NT-proBNP levels were independent predictors but the decrease in NT-proBNP showed a protective effect (OR=0.19, p=0.071). The final study247 evaluated only the predictive ability of greater than 3,000 pg/mL discharge NT-proBNP levels and showed the largest HR (HR=13.63) for predicting 6 month mortality.
Admission and Discharge NT-proBNP Levels and Prognosis From 12 to 23 Months
Eight publications reported on the prognostic ability of NT-proBNP to predict all-cause mortality at 12 months (Appendix J Table J-16). Four related publications evaluated subjects in the PRIDE only,213 PRIDE combined with other sample,216 and ICON cohorts238,239 (which included PRIDE subjects) and these studies all showed admission NT-proBNP to be an independent predictor of 12 month mortality. Two of these studies213,216 were rated as problematic with respect to outcome measurement, relying on hospital records only to assess outcome. Three additional studies evaluated admission NT-proBNP and risk of subsequent mortality at 12 months and only one of these250 did not show that it was a significant predictor. Another study215 compared 24 and 48 hour admission levels and subsequent mortality prediction; only 48 hour NT-proBNP levels were a significant predictor.
Two studies215,242 evaluated discharge or after clinical stabilization NT-proBNP levels and showed HR of similar magnitude but different increments for added risk (500 vs. 1,000 pg/mL) (Appendix J Table J-16).
Admission and Discharge NT-proBNP Levels and Prognosis at 24 Months or Greater
Three studies assessed admission NT-proBNP levels and all-cause mortality at 24/25 months,245,251 and 6.8 years.256 All studies showed that admission NT-proBNP was an independent predictor despite differing prognostic models. One study245 showed an increasing HR with an increasing threshold for NT-proBNP levels (Appendix J Table J-17) but only those greater than 5,000 pg/mL were statistically significant.
NT-proBNP Levels Predicting Cardiovascular Mortality
A single study249 evaluated NT-proBNP levels at admission, at 12 hours post admission and the change from admission to 12 hour post admission to predict 30 day cardiovascular mortality (Appendix J Table J-18). These results were also stratified by subgroups of HF patients (chronic ischemic cardiomyopathy (ICM), decompensated non-ischemic cardiomyopathy (NONICM) and acute ischemia (AMI)). The findings in this study suggest that NT-proBNP levels after admission (at 12 hours or increase from baseline at 12 hours) are predictive of mortality but admission levels are not. There was some variation in statistical significance within the HF subgroups; the sample sizes were small relative to the covariates included in the model for the AMI and NONICM groups.
Two papers evaluated admission NT-proBNP levels at 6 months236 and 8.5 months229 as predictors of cardiovascular mortality (Appendix J Table J-19). Both studies showed that NT-proBNP was not a significant predictor, but both studies did not include important covariates in their prognostic models.
A single study evaluated NT-proBNP levels and cardiovascular mortality at 12230 and 15.5 months (Appendix J Table J-19). One study used the reduction of NT-proBNP levels greater than 30 percent relative to admission levels, as predictive of cardiovascular mortality; this study was rated as having some deficiencies with respect to identification and control of confounders. A second study compared admission NT-proBNP and log transformed NT-proBNP as predictors of cardiovascular mortality; although both HR estimates were significant, the log transformed value doubled the magnitude of the risk
Admission and Discharge NT-proBNP Levels and Morbidity Outcomes
Four studies assessed NT-proBNP levels and all-cause hospitalization at 30 days,253 at 6 months,234,243 and HF hospitalization at 8.5 months229 (Appendix J Table J-20). One study253 that was rated as problematic with respect to outcome measurement and confounding, showed that change in NT-proBNP relative to admission levels (less than 50 percent reduction) was a predictor of 30 day mortality but it was not statistically significant. In contrast, another study234 evaluated change in NT-proBNP levels (reduction of greater than 35 percent relative to baseline) and showed that it had a protective effect for hospital readmission.
Another study243 compared admission and discharge NT-proBNP levels; although both were significant predictors of 6 month hospital readmission, the HR for discharge was of greater magnitude.
Admission and Discharge NT-proBNP Levels Predicting Composite Outcomes
All-Cause Mortality and All-Cause Morbidity
Seven publications evaluated the composite outcome of all-cause mortality and all-cause morbidity (primarily rehospitalization) (Appendix J Table J-21) at 6 months.226,232 From these, four publications224,226,232,233 evaluated subjects from the same registry that were partial224 or completely overlapping samples226,232,233 and followed subjects up to 6 months. Three224,226,232 of these related publications evaluated change in NT-proBNP levels as: (1) change-decrease of greater than or equal to 30 percent (group 1); (2) changed greater than 30 percent (group 2); or (3) change-increase greater than 30 percent. The fourth publication233 evaluated decrease less than and greater than 30 percent and discharge levels. Although all three of these thresholds were independent predictors, the increase by greater than 30 percent had the HR of greatest magnitude across all three studies for predicting 6 month composite outcome. Additionally, all four publications show that a decrease less than 30 percent relative to admission in NT-proBNP levels incurs an increased risk for death or rehospitalization (Appendix J Table J-21). This was observed for patients with and without renal failure.233 In contrast, one study234 evaluating decrease in NT-proBNP levels greater than 35 percent from baseline discharge NT-proBNP, showed a protective effect (HR=0.42 (95% CI, 0.12 to 0.76), p=0.010) from mortality and rehospitalization at 6 months.
Two studies evaluated the predicting composite outcome of all-cause mortality and all-cause morbidity at 12 months. One study250 reported that admission NT-proBNP was not a significant predictor. The second study253 showed that 50 percent change (relative to admission levels) was an independent predictor of outcome.
All-Cause Mortality and Cardiovascular Morbidity
Five studies evaluated all-cause mortality and cardiovascular endpoints at 2 months,1 184 days,247 252 days,227 261 days,241 and 601 days.254 All but two studies evaluated all-cause mortality and HF or cardiovascular readmission; one study1 evaluated all-cause mortality and recurrent HF and the other study254 measured all-cause mortality and heart transplant list (Appendix J Table J-22). Despite the different prognostic models and time intervals, all were shown to be independent predictors of the composite outcomes; only one of these was not statistically significant for predicting all-cause mortality and recurrence of HF at 2 months.1
Cardiovascular Mortality and All-Cause Morbidity
A single study244 evaluated predictive ability of change in NT-proBNP levels (reduction less than 30 percent) for the composite endpoint of cardiovascular mortality and hospital readmission at 6 months (Appendix J Table J-23). This study showed that a reduction less than 30 percent increased the risk of this endpoint (HR=2.04 (95% CI, 1.02 to 4.08), p=0.04).
Cardiovascular Mortality and Cardiovascular Morbidity
Five studies evaluated the composite outcome of cardiovascular mortality and cardiovascular morbidity at 3 months,231,235 6 months,252 24 months,237 and 6.8 years.255 Two of these studies did not show a statistical significance for predicting composite endpoint at 3 months235 and 24 months.237 Two studies252,255 showed that admission NT-proBNP was a significant predictor for this composite outcome. The final study231 showed that a decrease at 2 weeks post admission had a protective effect (HR=0.79 (95% CI, 0.70 to 0.88), p<0.001) for this composite endpoint (Appendix J Table J-23).
Comparing Prognostic Value of BNP and NT-proBNP in Decompensated Heart Failure Patients
Six studies3,213-217 evaluated BNP and NP-proBNP concurrently in acutely ill HF patients (Appendix J Table J-24). All studies recruited patients from emergency settings with the exception of one.214 Four of five publications recruited subjects from emergency settings evaluated admission BNP and NT-proBNP levels3,213,216,217 and one study215 evaluated post-admission and pre-discharge from hospital levels. The single study214 recruiting subjects admitted for decompensated HF also evaluated admission levels. The studies evaluated both short term prediction (14 to 90 days) and longer term prediction (1 year). All studies evaluated all-cause mortality only. Two publications based their analyses on the same study cohort (BACH trial).
In general, these six publications were at low risk of bias, but the majority of studies3,213-216 measured the outcome based on hospital records or did not specify exact outcome and as such, are prone to misclassification bias. Appendix J Table J-24 shows the findings from these six publications and comparisons between predictive ability of BNP versus NT-proBNP can be evaluated. Two studies evaluated prognostic strength in the short term.215,217 One study217 showed that both assays were not statistically significant predictors of 14 day all-cause mortality. The second study215 showed differences in prediction between assays collected at 24 and 48 hours with only BNP being a significant predictor; predischarge values for predicting 30 day all-cause mortality were significant for both assays.
The single study214 evaluating patients admitted to hospital showed a decrease in BNP (<10% relative to baseline) that was not statistically significant (p=0.817) but a decrease in NT-proBNP (<3% relative to baseline) that was significant (p=0.005). Two publications evaluating subjects from the BACH trial (differing sample sizes) showed that both markers added incremental value to the model,3,217 but showed mixed results as a predictor, as only one model with NT-proBNP was significant. Three studies213,215,216 compared BNP and NT-proBNP for predicting 1 year all-cause mortality.
The single study215 that compared BNP and NT-proBNP levels at 24 and 48 hours post admission and also at predischarge, showed in the multivariable analysis that all three levels for both assays were significant predictors of subsequent 1 year mortality; only NT-proBNP at 24 hours was not statistically significant. The two other studies213,216 evaluated admission BNP/NT-proBNP levels and showed that both assays were statistically significant predictors of 1 year mortality despite having different covariates within the multivariable models.
Overall, these studies present mixed findings to suggest that BNP and NT-proBNP have differences with respect to predicting shorter term mortality (14 to 90 days). The three studies evaluating longer term mortality (1 year) would suggest that both assays are predictors of mortality and may not differ in their predictive strength.
Chronic Stable Heart Failure and BNP Assay
Design Characteristics of Studies
The prognostic value of BNP among patients with chronic stable HF was assessed in 15 publications.222,261-274 All of the included studies measured BNP at admission to the study. As this group of studies examined stable HF, the measurement of BNP at discharge or change in BNP between admission and discharge are not relevant to the question. One article measured both BNP and NT-proBNP and is included in this section for a total of 16 papers.275 One article222 used an RCT design and the remaining studies (n=15) used prospective cohort designs. The selected articles were published between 2003 and 2011 and were conducted world-wide including: four in North America,261-263,267 and seven in Europe.264,265,268,270,273-275 Two publications were from studies conducted in multinational sites,269,271 one from Turkey266 and two were unclear as to region of conduct.222,261
Four articles reported patient population with mean or median ages ranging from 60 to 69 years.222,261,266,270 Three had a somewhat older patient populations with mean or median ages between 70 and 79 years.272-274 Nine articles had populations with mean ages less than 60.262-265,267-269,271,275 Two papers reported age ranges of 15 to 84.263,275 The percentage of males enrolled in each study ranged from 59 percent261 to 89 percent264 (mean=68.2%, median=72.5%). Sample size populations ranged from 46272 to 1,294274 (mean=398, median=254).
Table 23 shows study outcomes and durations for each publication grouped by the outcomes. Some papers reported study duration as endpoints of years or months and reported durations ranging from 6 months to 24 months. Most reported mean or median study durations ranging to a median of 68 or a mean of 55.8 months followup.
Heart Failure Diagnosis and Severity at Admission
The diagnosis of HF was established in a number of ways, but was usually confirmed using echocardiography, carried out as part of the study or obtained from previous medical records at study enrollment or by clinical assessment. The subjects included were defined as having stable HF according to the inclusion criteria with the exception of one study which recruited subjects with chronic HF that was worsening.268 The majority of studies included subjects across all levels of the NYHA classification levels I to IV at enrollment. The exceptions were two articles222,261 enrolling patients at NYHA classification levels III and IV only. Many studies assessed LVEF of enrolled patients at various thresholds including: less than 30 percent,264 less than 35 percent,265 less than 40 percent,262,263,270 and less than 45 percent.266,268
BNP Tests and Threshold Values
The majority of publications (n=13) used the TRIAGE -B-Type Natriuretic Peptide (BNP) Test to measure BNP. Two articles,268,272 used the ADVIA-Centaur® B -Type Natriuretic Peptide (BNP) and one article Abbott Architect BNP reagent Kit.267
Six papers categorizing high and low BNP cutpoints based on ROC results.263,268,271,275 Papers reported other rationales for BNP threshold selection including previously reported prognostic cutpoints261 and mean or median BNP levels.262,265-267,270,273,274 The remaining articles222,264,269,272 used BNP as a continuous variable.
Most articles (n=14) were independent studies, publishing results on unique data sets, with the exception of one269 that published results on a companion data set, and one275 where study affiliation could not be identified.
Definition of Outcomes
Most articles assessed the prognostic value of BNP on mortality. The majority (n=10) examined all-cause mortality,261-264,268-271,274,275 one assessed sudden cardiac death,222 and one examined cardiovascular mortality and pump failure mortality.261 Heart failure hospitalization admissions was assessed by one article.274
Several studies evaluated composite outcomes that combined all-cause mortality with nonfatal events. The composite of all-cause mortality and cardiovascular morbidity was reported by seven studies.262,266-268,272-274 Other outcome assessed included all-cause hospital readmission274 and heart transplantation.262,268 One assessed a composite of cardiovascular mortality and morbidity.265 (Table 23)
Risk of Bias
The populations for this group of studies was mostly suitably defined, described, and represented the population of interest. Only one paper did not define the population adequately,271 and one paper's268 population was considered not representative of the study's source population or population of interest. There is low risk of bias for population description and selection.
The prognostic factors were fairly well addressed. BNP was appropriately defined and measured in all but two papers.268,274 The other prognostic factors were well defined and measured in all but one paper.275 The indeterminate results or missing data was less well addressed by a few papers.222,270,272-274 There is low risk of bias for the BNP and low risk of bias for the other prognostic factors.
Outcome measurement was defined by most studies, with the exception of one.271 We set fairly stringent criteria for obtaining accurate data and only two studies met these criteria.267,275 Composite outcomes are not recommended by Hayden and as we included composite outcomes a number of studies did not meet this criterion.264-268,272,273 The risk of bias for the outcomes is moderate.
Confounding was particularly poorly addressed. According to the criteria we expected studies to consider age, sex, BMI, and renal function as important covariants. Some studies met these criteria.262,263,267-269,272,275 The risk of bias from confounders (BMI in particular) is high (Figure 13)
Analysis was appropriately conducted in all the studies. Most of study the designs were observational cohorts and the question posed for the reports most often looked at the predictive value of BNP in the population described. There is low risk of bias for analysis.
In summary, the risk of bias in this group of papers for KQ3 is rated as moderate.
BNP Independent Prediction of Single Outcomes
All-cause mortality was the outcome in 10 articles (Appendix J Table J-26).261-264,268-271,274,275 One article had followup periods of 6 months or less,261 and showed a significant adjusted HR for cutpoint BNP>1,000 pg/mL (HR=1.99 (95% CI, 1.18 to 3.36)) in a population with NHYA class III or IV HF. One article263 had a followup period of 12 months and showed significant adjusted relative risk (RR) for patients with advanced HF (RR=17.34 (95% CI, 2.23 to 134.9)) in a population with LVEF <40 percent. This study also investigated anemia and BNP>485 pg/mL remained a significant predictor in both the anemic and non-anemic subjects. There were five papers reporting on followup periods between 12 and 24 months. A significant adjusted HR with BNP and logBNP measured at various levels of BMI was demonstrated but an HR for the entire population was not reported.262 In patients with United Network of Organ Sharing (UNOS) status 2, logBNP remained an independent predictor of all-cause mortality.264 In more general populations of chronic HF outpatients,268,269,271 non-significant statistics were reported. One of these studies reported a model with BNP (non-significant) and a model with logBNP (HR=1.32 (95% CI, 1.16 to 1.50)).269 Three articles had followup periods greater than 24 months, two274,275 assessed the prognostic ability of logBNP in predicting all-cause mortality among HF patients attending a disease management program (HR=1.53 (95% CI, 1.33 to 1.75)),274 and in a general chronic HF population (HF=1.34 (95% CI, 1.34 to 1.49)).275 The final paper assessing a followup period of greater than 24 months270 showed significant results for outpatients with stable mild to moderate HF and LVEF <40 percent (BNP>250 vs. ≤250), adjusting for left bundle branch block (LBBB) and beta blockers (HR=1.59 ( 95% CI, 1.07 to 2.36)).
Sudden cardiac death was not associated with a significant adjusted HR using a BNP cutpoint of 700 pg/mL (HR=1.03 (95% CI, 0.65 to 1.32)),222 while pump failure mortality showed a significant HR for 1,000 pg/mL (HR=3.78 (95% CI, 1.63 to 8.78))261 Cardiac mortality demonstrated a significant adjusted HR for BNP >1,000 pg/mL (HR=1.76 (95% CI, 1.01 to 3.07)).261 (Appendix J Table J-27).
BNP Independent Prediction of Composite Outcomes
The composite outcome of all-cause mortality and cardiovascular morbidity (Appendix J Table J-30) was reported by six studies.262,266-268,272,273 One these studies reported a non-significant HR using heart transplant as the cardiovascular morbidity.268 The other studies reported significant HR ranging from HR=1.1 (95% CI, 1.1 to 1.2)267 to HR=3.194 (95% CI, 1.625 to 6.277).266 The factors used to adjust the multivariable model varied in these studies but included: age, sex, race, tobacco use, creatinine, BMI, LVEF and other echocardiographic measures, etiology of HF (ischemic and non-ischemic), NYHA class, Hb, IL-6, hypertension, albumin, FT3, and medications.
Chronic Stable Heart Failure and NT-proBNP Assay
Design Characteristics of Studies
The prognostic value of NT-proBNP among patients with chronic stable HF was assessed in 88 publications.4,53,275-362 One additional article275 measured both BNP and NT-proBNP and is also included in this section, for a total of 89 papers.
Two articles were RCTs of NT-proBNP-guided therapies versus non NT-proBNP-guided therapies.4,53 Four articles were secondary analyses of data initially collected in RCTs; however, the secondary analyses did not account for the groups to which participants were randomized.279,286,301,309 One was a nonrandomized controlled clinical trial,298 and one317 was a post hoc analysis of an RCT. One study342 used a cross-sectional design and two did not report the study design used.322,346 Three papers275,352,360 used a retrospective cohort design and the remaining 76 publications used prospective cohort designs. All articles were published between 2001 and 2012 and were conducted in the following parts of the world: five in North America,293,305,314,338,345 11 in Asia,289,290,297,298,302,311,313,327,332,336,346 and one in Austria.4 Sixteen publications were from studies conducted in multinational sites,53,284,286,301,307,309,317,318,322,328,331,339,340,344,351,356 and four276,334,335,341 were unclear as to region of conduct. The remainder (n=52) were published in Europe.
Several authors published results from large studies, including one279 from the Carvedilol Prospective Randomized Cumulative Survival (COPERNICUS) trial, one333 from the MUerte Sȗbita en Insuficiencia (MUSIC) study, three301,309,351 from the Controlled Rosuvastatin Multinational Trial in Heart Failure (CORONA), two53,345 from the Cardiovascular Health Study (CHS), and one305 from the Assessment of Doppler Econocardiography Study in Prognosis and Therapy. One article275 was unclear as to study affiliation and nine published results on companion data sets.280,281,286,288,297,298,313,326,327 The remaining articles (n=71) were independent studies, publishing results on unique data sets.
Risk of Bias
The risk of bias was assessed based on the Hayden criteria58 as described in the Methods section, Appendix E, Figure 14, and Appendix J Table J-32 shows the percentage ratings for risk of bias for studies evaluating NT-proBNP in stable HF populations.
As seen in Figure 14 the populations for this group of studies were, for the most part, suitably defined (98 percent) and described (99 percent) with the exception of two papers.310,353 It was clear in 96 percent of papers that the study population represented the source population or the population of interest, with one paper's304 population not representing the source population or the population of interest and three295,337,353 being unclear as to whether this was the case. Therefore, all of the domains within this area of bias are rated as low risk of bias; the overall rating for this area of bias is also low.
Eighty-one percent of articles described their study's completeness of followup and 82 percent were assessed as having adequate completeness of followup. Attrition was not adequately described in two articles,305,323 and we could not ascertain whether attrition was adequately described and complete in nine articles.275,298,300,327,328,342,345,351,360 In four other articles,288,289,344,349 completeness of followup was adequate, yet the description of followup was either unclear289,349 or inadequate.288,344 In two articles,276,348 attrition was not adequately described and we could not ascertain whether followup was completed. A rating of unclear was assigned to each domain and an overall rating of unclear to the risk of bias for study attrition.
NT-proBNP and other prognostic factors were appropriately defined and measured in all except two included article.282,290 The issue of indeterminate results or missing data for both NT-proBNP and other prognostic factors were less well addressed by a some papers,278,280,289,298-300,302,320,324,328,342,348,353,357,360,361 although the published reports do not suggest results were biased. The domain-specific and overall risk of bias rating for prognostic factor measurement is low.
Outcomes were defined in 98 percent of publications (low risk of bias), with the exception of two articles.298,327 Fairly stringent criteria for obtaining accurate data on outcomes were set and only 30 of the 89 included articles (34 percent)53,275,280,281,283,286,288,296,303,312,320,321,323,329,338,339,341,347,350-360,362 measured the outcomes appropriately (high risk of bias). Twenty-one percent of studies (n=19) used composite outcomes only in their analysis and did not analyze any single outcome in multivariable analyses.53,285,287,294,303,305,306,311,317,321,322,324,333,336-338,340,348,353 The overall risk of bias for outcome measurement is high.
Confounding was particularly poorly addressed. According to the a priori criteria, studies were expected to measure age, sex, BMI, and renal function as important covariates. Fifty-six (63 percent) of the 89 articles met these criteria (low risk of bias). In publications that measured confounders, the means of adjustment was typically a multivariable regression analysis (low risk of bias). The overall risk of bias for measuring and accounting for confounding is high.
Analyses were appropriately conducted in all of the included articles. Most of the study designs were observational cohorts and the question posed for the reports most often looked at the predictive value of NT-proBNP in the population described. Consequently, a low risk of bias was assigned to this area.
For the seventh potential area of bias, it was considered whether the included articles were designed to test the prognostic value of NT-proBNP, rather than being secondary analyses of data collected for other purposes. All except five papers298,317,332,339,341 were adequately designed for prognostic study, earning a low risk of bias to this area.
Chronic Stable Heart Failure and NT-proBNP Predicting All-Cause Mortality
Table 24 describes study outcomes and followup periods for studies assessing mortality outcomes (n=69). Sudden death was considered to be part of all-cause death. Pump failure death was not a primary study outcome. Since we included articles that performed multivariable analyses, measures of association reported in the text are adjusted in the analyses for the influence of covariates. Two articles within which the authors failed to report the length of followup were not considered.301,309
Fifty-two articles included all-cause mortality as an outcome in the assessment of the predictive value of NT-proBNP in persons with chronic and stable HF (including the two publications that did not reports lengths of followup) (Appendix J Table J-33).4,275-282,284,286,288,290-292,295,296,299-302,307-309,311,313-315,317,320,321,326-332,336,342,344,345,348,350,353,355-360,362
NT-proBNP Levels and Prognosis 6 Months or Less
Two papers279,321 reported a followup of 6 months or less. In the first paper,279 an adjusted NT-proBNP value >1,767 pg/mL was a highly significant risk indicator in the model with RR=2.17 (95% CI, 1.33 to 3.54). In the second paper,321 NT-proBNP level was a strong independent predictor of 6 month mortality, with a seven-fold risk of early death (OR=7.6, 95% CI, 1.4 to 40.8).
NT-proBNP Levels and Prognosis From Greater Than 6 Months to 12 Months
Five papers4,277,292,344,359 followed up participants for periods between 6 and 12 months, with two articles including persons with mean ages of 63344 and 65 years.359 Of the remaining three papers, two277,292 included persons with a mean age of approximately 50 years and one4 contained subjects with a mean age of over 71 years. Two papers reported NT-proBNP cutpoints of >1,490277 and >1,548 pg/mL.292 One paper344 reported an adjusted HR=1.43 per standard deviation (SD) unit increase, but did not reach statistical significance (95% CI, 0.89 to 2.3). Three articles reported chi-squares of 20.2 (p<0.001),359 13.8 (p=0.0002),292 and 6.03 (p=0.01),277 all of which suggest predictive values for NT-proBNP. One article4 did not report results of the multivariate analysis.
NT-proBNP Levels and Prognosis From Greater Than 12 Months to 24 Months
Eight articles276,278,286,290,291,320,326,332 reported followups of greater than 12 months and up to 24 months. One of these articles276 did not report any outcome data and will not be discussed further. Of the remaining papers, three290,320,332 included persons with mean ages of 71 years, one278 used populations with mean ages of 82 and 50, and two291,326 included persons with mean ages of 51 years. One paper286 did not report on the age of study participants. Reported measures of association in four articles286,320,326,332 were above 1.0 (indicating NT-proBNP is predictive of all-cause mortality) yet CIs included the null value in two cases,326,332 the exception were HRs of 1.16 (95% CI, 1.042 to 1.291),332 2.58 (95% CI, 1.24 to 5.37),320 4.02 (95% CI, 2.63 to 6.11),286 and 2.07 (95% CI, 1.76 to 2.46).286 The remaining three articles reported a chi-square of 13.6 (p<0.001),290 14.2 (p<0.001),291 and 26.95 (p=0.0001),278 all of which suggest predictive values for NT-proBNP.
NT-proBNP Levels and Prognosis From Greater Than 24 Months to 36 Months
Nineteen articles280-282,284,288,295,296,300,301,307,309,317,327-329,336,353,355,360 reported followups of greater than 24 months and up to 36 months. Sample sizes ranged from 50295 to 1,503.317 Mean or median age ranges encompassed 60 to 69 years in 12 articles,282,284,288,295,296,300,317,327,328,336,355,360 and 70 to 79 years in five publications.280,281,307,329,353 One article did not report on population age. Authors reported cutpoints in 10 articles,278,281,295,296,317,327,329,336,353,355 ranging from >641 pg/mL327 to 10,000 pg/mL.295 Three papers adjusted HR based on decrements including one SD unit increase in NT-proBNP,282,360 and a 500 pg/mL increase.284 Reported point-estimate HRs ranged from 1.03 per pg/mL increase284 to 4.2.296 All point estimates, except the ones calculated in two articles,284,331 were statistically significant at the five percent level. In one paper355 NT-proBNP level was a strong independent predictor of all-cause mortality, with almost a three-fold risk of early death (OR=2.7; 95% CI, 1.3 to 5.7) Three papers278,327,336 found NT-proBNP to have an independent predictive value, but the authors only reported chi-square test statistics rather than measures of association.
NT-proBNP Levels and Prognosis From Greater Than 36 Months to 48 Months
Nine articles302,308,313-315,331,342,357,362 reported followups of greater than 36 months and up to 48 months. Sample sizes ranged from 148342 to 992.362 Mean or median age ranges encompassed 50 to 59 years in one paper,314 60 to 69 years in six articles,313,315,331,342,357,362 and 70 to 79 years in two publications.302,308 Three articles reported cutpoints of >796 pg/mL,313 1,000 pg/L,362 and 1,720 pg/mL.357 Three of the nine papers adjusted HR based on decrements of NT-proBNP. Decrements included a one log unit (1 log pg/mL) increase,308,331 a change of 2,000 pg/mL,314 or a 100 pg/mL increase.315 All adjusted HR indicated positive associations between higher values of NT-proBNP and all-cause mortality. Reported point-estimate ranged from HR=1.01 per 100 pg/mL increase315 to HR=4.3.280 One article313 reported a chi-square of 2.195 (p=0.0282). All point estimates, with the exception of one,331 were statistically significant at the five percent level.
NT-proBNP Levels and Prognosis From Greater Than 48 Months to 60 Months
Five articles299,311,350,356,358 reported followups of greater than 48 months and up to 60 months. Sample sizes included 285,311 and 1,087,299 and 1,844.350 Two of the three articles included mean or median age groups ranging from 70 to 75.299,311,356 One article350 did not report the age of their study population. Two articles reported statistically significant HRs, indicating positive associations between higher values of NT-proBNP and all-cause mortality. Reported point-estimate included: HR=1.006 (95% CI, 1.004 to 1.009),311 HR=2.06 (95% CI, 1.68 to 2.52),299 and HR=3.2 (95% CI, 2.69 to 3.79).299 In one article,356 baseline natural logarithm NT-proBNP as a continuous variable was independently associated with an increased risk of all end points, even after adjustment for several other baseline characteristics; however, use of angiotensin receptor blocker Irbesartan was associated with improved outcomes in patients with NT-proBNP below, but not above, the median levels. Adjusted HRs showed positive association between higher values of NT-proBNP and all-cause mortality.358 The final article350 did not report outcome data.
NT-proBNP Levels and Prognosis Greater Than 5 Years
Four studies (six reports) examined all-cause mortality for followup periods that were longer than 5 years.275,330,345,348,356,358 Mean or median age ranges encompassed 50 to 59 years in three papers,275,330,348 and 70 to 79 years in the remaining three publications.345,348,356 Authors reported cutpoints in three articles,345,348,356 ranging from 190 pg/mL345 to 808 pg/mL,348 with one348 reporting various cutpoints based on sex and beta-blocker use. One paper275 reported results that were not statistically significant, although a statistically significant result was found after adding midregional pro-atrial natriuretic peptide (MR-proBNP) to a model with BNP and NT-proBNP already included. Prior to the addition of MR-proBNP, NT-proBNP was an independent predictor (p<0.05) of all-cause mortality. Another paper330 found NT-proBNP to have an independent predictive value, but the authors only reported chi-square test statistics rather than measures of association. Of the remaining three papers, two345,348,358 had adjusted HRs indicating positive associations between higher values of NT-proBNP and all-cause mortality. Reported point-estimate ranged from HR=1.89 per 100 pg/mL increase to HR=3.37.348 All point estimates were statistically significant at the five percent level (Table 24).
NT-proBNP Levels Predicting Cardiovascular Mortality
Seventeen articles293,297,298,301,304,309,314,324,333,335,340,342,343,345,346,352,362 examined the prognostic value of NT-proBNP for cardiovascular mortality in person with stable HF (Appendix J Table J-34). Two articles which did not report the length of followup were not included.301,309
NT-proBNP Levels and Prognosis Less Than 12 Months
No articles reported cardiovascular mortality for periods of less than 12 months.
NT-proBNP Levels and Prognosis From 12 to 24 Months
Four articles335,340,343,352 contained followup periods of over 12 months and up to 24 months (Appendix J Table J-34). Sample sizes ranged from 82335 to 491.340 Mean or median age ranges encompassed 60 to 69 years in three papers,335,340,352 and 70 to 79 years in one publication.343 Three of the four papers reported cutpoints of 3,337 pg/mL,352 2,465 pg/mL340 and >844 pg/mL.335 Three publications reported added predictive value for admission NT-proBNP in terms of cardiovascular mortality. The first article,340 reported an adjusted HR=3.36 (95% CI, 2.4 to 4.7). The second article335 found an HR=1.02 (95% CI, 1.01 to 1.03) with the same level of significance (p <0.001) obtained using log-transformed NT-proBNP levels (HR=9.79; (95% CI, 3.02 to 31.8)). The third paper found discharge NT-proBNP to be inversely related to survival, reporting an HR=0.43 (95% CI, 0.23 to 0.79).352 Another study343 also found NT-proBNP to be a significant predictor of cardiovascular mortality (HR=1.039 (95% CI, 1.014 to 1.065) per 100 pg/mL).
NT-proBNP Levels and Prognosis Greater Than 24 Months
Followup was greater than 24 months in 11 papers (Appendix J Table J-34).293,297,298,304,314,324,333,342,345,346,362 Two articles314,342 did not report quantitative results and will not be mentioned further in this subsection. Sample sizes spanned from 75324 to 992.362 Two papers included persons with a mean age of 53324 or 57293 years. Five articles297,298,304,333,346,362 included subjects with a mean age between 62 and 68 years. The remaining article included persons with a mean age of 75.2 years.345 Cutpoints varied from a low of ≥190 pg/mL345 to a high of >908 pg/mL.333 One article324 did not report cutpoints, although it calculated adjusted OR for participants at rest for each 50 pg/mL decrement of NT-proBNP (OR=0.91; 95% CI, 0.656 to 1.269) and for each 20 pg/mL change in NT-proBNP (OR=1.106; 95% CI, 1.022 to 1.197) Eight articles293,297,298,304,333,345,346,362 reported adjusted HRs that indicted that NT-proBNP had predictive ability for cardiovascular mortality. These values were statistically significant at the five percent level and ranged from 1.42 (n=204)293 to 6.8(n=95);346 the adjusted HR in the largest sample (n=992)362 was HR=2.87 (95% CI, 1.80 to 4.57) for NT-proBNP levels >1,000 pg/l. One article346 also reported chi-squares of 19.2 (p<0.001) for baseline NT-proBNP and 16.3 (p<0.0001), for discharge NT-proBNP; both of which suggest predictive values for NT-proBNP.
NT-proBNP Levels Predicting All-Cause and Cardiovascular Morbidity
Table 25 describes study outcomes and followup period for articles assessing all-cause and cardiovascular morbidity outcomes (n=12).
Twelve studies4,276,281,283,286,290,302,308,309,319,332,347 examined the prognostic value of NT-proBNP for all-cause and cardiovascular morbidity in persons with stable HF (Appendix J Table J-35 and Table J-36) Eight studies281,286,290,302,308,309,319,332 investigated morbidity as some form of hospitalization, including first cardiovascular hospitalization308,309 or time to first hospitalization,302 hospital admission for HF,286,290,332 all-cause hospitalization,281 or rehospitalization with worsening HF.319 Three of these eight studies290,302,309 also included a composite outcome of hospitalization and all-cause mortality.
Three studies defined morbidity as a decision to initiate cardiac transplant,276 change in NYHA class and quality-of-life,283 or worsening renal function.347 One study4 reported that NT-proBNP was the strongest prognostic indicator of first HF rehospitalization and a composite outcome of first HF rehospitalization and death; however, the authors did not show any regression results and this study will consequently not be considered further in this section.
Eleven studies included samples drawn from HF clinics. Mean ages of participants ranged from 56276 to 73;309 five studies290,302,308,309,332 included persons with mean ages between 71 and 73. One study283 stratified mean age data by participant subgroup, with the highest mean age being 70 years. Another study286 reported that 71 percent of the sample was aged less than 70 years, while 29 percent were aged 70 years or above. One study281 stratified participants by NT-proBNP cutpoint and reported a mean age of 69 years (<1,381 pg/mL) or 75 years (>1381 pg/mL). A majority of participants were male in all studies, with the proportion of males ranging from 0.55283 to 0.84.332
Nine studies reported mean lengths of followup in the range of 12283 to 48 months.308 One study276 indicated followup lasted anywhere from 3 to 6 months, depending on the participant; one study reported a median length of followup of 28 months.281 Sample sizes ranged from 78283 to 3,916.309 Mean sample size was 875, including the two largest studies (n=3,342,309 n=3,916286). Excluding the two largest studies, mean sample size was 264.
For most outcomes, higher levels of NT-proBNP were predictive of increased morbidity in persons with stable HF. Results in all except one study283 showed this positive association. In only one study302 did the results fail to achieve statistical significance.
Findings for morbidity measured as some form of hospitalization did not vary in terms of mean age, proportion of males, or length of followup. The largest effect was observed in a 48 month study of 354 persons,308 where baseline log NT-proBNP and log NT-proBNP measured after 6 months of followup, were both associated with increased unplanned cardiovascular hospitalizations. Adjusted HRs and 95% CIs (shown in brackets) were 3.16 (2.24 to 4.46) for baseline log NT-proBNP and 2.45 (1.50 to 4.01) for 6 month log NT-proBNP. The next largest effect was observed in a 23 month study (n=3,916) where the adjusted HR=2.66 (2.19 to 3.22) for persons above a cutpoint of 895 pg/mL. The authors found a cutpoint of 1,007 pg/mL to be optimal for prognostic purposes, with an AUC of 0.69, sensitivity of 70 percent, and specificity of 59 percent. In the other large study, consisting of 3,342 participants and an average followup of 32 months,309 the adjusted HR for a first cardiovascular hospitalization was HR=1.36 (1.29 to 1.44) for each 1-unit increase in log NT-proBNP.
In a study lasting 14 months,332 the positive association between NT-proBNP and hospitalization was more muted, with an adjusted HR=1.07 (1.00 to 1.14; p=0.03).332 Note, though, that a 44 month study of time to first hospitalization found an adjusted HR=1.01 (0.96 to 1.05).302
One 21 month study319 of rehospitalization due to worsening HF dichotomized NT-proBNP at a cutpoint of 1,474 pg/mL. Persons with NT-proBNP values above 1,474 pg/mL had faster times to rehospitalization (HR=1.26; 95% CI, 1.03 to 1.55). Similar results were reported in a study with a median followup of 28 months, where NT-proBNP values above 1,381 pg/mL were associated with faster times to hospitalization (HR=1.71; 95% CI, 1.24 to 2.36).281 This study also reported that a doubling of NT-proBNP levels would lead to faster hospitalization (HR for log2 NT-proBNP: HR=1.19; 95% CI, 1.09 to 1.31). Another study290 involving 24 months of followup claimed higher NT-proBNP levels were positively associated with hospitalization for HF, but the authors only reported a chi-square test statistic (11.2) and p-value (p <0.01). This study290 also showed Kaplan-Meier curves depicting greater hospitalization for persons with NT-proBNP levels >1,556 pg/mL.
Three studies featured a composite outcome of hospitalization and mortality. One 24 month study290 only provided a Kaplan-Meier curve, which showed shorter times to either outcome in persons with NT-proBNP levels >1,556 pg/mL. A 32 month study309 found an adjusted HR=1.64 (95% CI, 1.54 to 1.74) and a 44 month study302 found a non-significant adjusted HR=1.03 (95% CI, 1.00 to 1.06).
Besides the studies discussed above,290,319 the only other hospitalization study that provided cutpoints was the 48 month investigation of first unplanned cardiovascular hospitalization.308 This study reported elevated risks of hospitalization at each of five levels of NT-proBNP, with the levels based on quintiles of baseline NT-proBNP (i.e., ≤474, 475 to 1,090, 1,091 to 2,529, 2,530 to 5,532, ≥5,533 (all values in pg/mL).
Other Morbidity Outcomes
Three studies276,283,347 examined other morbidity outcomes besides hospitalization; all found strong predictive effects for NT-proBNP. The odds of being recommended for cardiac transplant were 10.6 times greater (95% CI, 3.7 to 14.5) in persons with an NT-proBNP value greater than 1,000 pg/mL in a study of 550 HF patients.276 In a study of 125 persons with HF, the risk of worsening renal function was 3.6 times greater (95% CI, 1.9 to 7.0) per standard deviation unit increase in log NT-proBNP.347 At a cutpoint of 696 pg/mL, NT-proBNP showed 92.9 percent sensitivity, 54.6 percent specificity, and an AUC of 0.80 (95% CI, 0.72 to 0.89) to predict worsening renal function.
A 12 month study examined two outcomes, namely improvements in NYHA class (n=78) or quality-of-life (n=71).283 The authors measured quality of life using the Minnesota Living with Heart Failure Questionnaire.363 Resistance to improvement in NYHA class was associated with low baseline NT-proBNP (OR=0.49; 95% CI, 0.31 to 0.78 on log NT-proBNP). Thus, high pre-treatment NT-proBNP levels suggested potential improvement in functional status. The authors did not report multivariable results for quality-of-life because model fit was poor.
NT-proBNP Levels Predicting All-Cause Mortality and All-Cause Morbidity
Table 26 describes study outcomes and followup period for articles assessing all-cause mortality and all-cause morbidity outcomes (n=3).
Three studies279,286,293 examined all-cause mortality and all-cause morbidity, which was defined as hospitalization279,293 in two studies. The third study286 reported a composite outcome of “mortality and morbidity”, yet the authors did not clearly define morbidity. The studies included outpatients with HF. Proportions of males and mean ages were 0.81 and 63 years,279 0.68 and 57 years,293 and 0.80 with mean age unreported.286 Sample sizes and lengths of followup were 1,011 participants and a mean of 5.3 months,279 204 participants and a median of 36 months,293 and 3,916 participants and a mean of 23 months.286
In all cases, higher levels of NT-proBNP were associated with the composite outcomes. The adjusted relative risk was 2.11 (95% CI, 1.54 to 2.90) in the 5.3 month study for persons with an NT-proBNP level >1,767 pg/mL; adjusted HRs (CIs) were 1.23 (1.12 to 1.35) for persons with a level >1,000 pg/mL in the 36 month study293 and 2.20 (1.92 to 2.51) for participants with a level >895 pg/mL in the 23 month study.286 (Appendix J Table J-37)
NT-proBNP Levels Predicting Cardiovascular Mortality and Cardiovascular Morbidity
Table 27 describes study outcomes and followup period for articles assessing cardiovascular mortality and cardiovascular morbidity outcomes (n=8).
Eight studies in 12 publications285,287,294,301,309,310,312,318,319,334,349,351 examined cardiovascular mortality and cardiovascular morbidity (Appendix J Table J-38). Three publications301,309,351 used data from the CORONA study and another three publications285,287,319 used data from a HF clinic in Germany. The main study publications for these two sets of papers were the ones with the most participants.301,319 All eight studies included outpatients with HF. Proportions of males ranged from 0.65318 to 1.00.294 Mean ages ranged from 54310 to 73301 years. The smallest sample size was 100310 and the largest was 3,664.301 The mean sample size was 601 including CORONA (n=3,664)301 and 164 excluding CORONA. Mean lengths of followup were 6 months,334 17 months,318 20 months,319 22 months,312 and greater than 24 months.294,301,310,349
A 6 month study334 found NT-proBNP levels above 2061 pg/mL to be positively associated with a composite outcome of cardiac death, heart transplantation, or HF hospitalization (HR=2.56; 95% CI, 1.36 to 4.82). A 17 month study318 examined three different cutpoints and found similar positive associations with a composite outcome of cardiovascular mortality, HF hospitalization, myocardial infarction, or stroke. Adjusted HRs (CIs) for each cutpoint were 3.1 (1.20 to 8.20) for >100 pg/mL, 5.8 (1.3 to 26.4) for >300 pg/mL, and 8.0 (2.6 to 24.8) for >600 pg/mL.
The longest of the three German HF clinic papers319 reported a mean followup of 20 months. This article contained information on 341 persons recruited between March 2003 and November 2005. The composite outcome was cardiac death, need for a cardiac assist device, or urgent cardiac transplantation. Time to event was faster in persons with NT-proBNP levels greater than or equal to 1,474 pg/mL (HR=1.56; 95% CI, 1.23 to 1.98). An earlier publication285 from the same clinic reported on 162 persons recruited between March 2003 and November 2004. These persons were followed for a mean of 13 months. Time to a composite outcome of cardiac death or urgent cardiac transplantation was faster in persons with NT-proBNP levels above 1,129 pg/mL (HR=3.79; 95% CI, 1.62 to 8.89). The first publication287 from this research group reported on 73 participants followed for a mean of 5.6 months. The composite outcome was rehospitalization due to worsening HF, cardiac death, or urgent cardiac transplantation. The adjusted HR for a cutpoint of 2,283 pg/mL was HR=8.33 (95% CI, 2.65 to 26.20).
A study of 103 persons with mean followup of 22 months found NT-proBNP was not associated (p=0.2) with cardiovascular mortality or HF rehospitalization.312 The authors did not report HRs for NT-proBNP or any other variables that were non-significant in their multivariable regression model.
Besides the CORONA publications,301,309,351 three other studies294,310,349 followed participants for over 24 months. A 100-person study310 with 25 months of mean followup reported an odds ratio of 1.27 (95% CI, 1.07 to 1.51) for a cutpoint of 1,000 pg/mL. The composite outcome was cardiovascular mortality and HF hospitalization. A 28 month study294 examined the occurrence of cardiovascular mortality or cardiovascular hospitalization in 163 men. When the multivariable regression model included dichotomized covariates for dehydroepiandrosterone sulphate levels and Beck Depression Inventory scores, men with NT-proBNP levels >500 pg/mL had a small increase in risk for the outcome (HR=1.02; 95% CI, 1.01 to 1.03). When these covariates were treated as continuous in the model, the increase in risk was statistically nonsignificant (HR=1.01; 95% CI, 1.00 to 1.03; p=0.09). A 37 month study349 of 107 persons showed an increased odds of cardiovascular mortality or HF hospitalization in participants with a log-transformed NT-proBNP level at or above a log-transformed cutpoint of 2.47 pg/mL (OR=4.16; 95% CI, 1.29 to 13.44).
Turning to the three CORONA articles,301,309,351 participants were followed for a mean of 32 months. The primary composite outcome was cardiovascular mortality, nonfatal MI, or nonfatal stroke. A secondary composite outcome was any coronary event, which included sudden death, fatal or nonfatal MI, coronary revascularization, ventricular defibrillation by an implantable defibrillator, resuscitation from cardiac arrest, or hospitalization for unstable angina. The authors also had a post hoc outcome called atherothrombotic endpoint (i.e., fatal or nonfatal MI or fatal or nonfatal non-hemorrhagic stroke). The paper301 with the largest sample size (n=3,664) reported the impact of log-transformed NT-proBNP on the aforementioned three composite outcomes. These same results were also reported in a slightly earlier paper309 where the CORONA team analyzed 3,342 persons who had complete data for all of the variables that were included in the regression analyses. Adjusted HRs (CIs) for each log unit change in NT-proBNP were 1.59 (1.48 to 1.71) for the primary outcome, 1.47 (1.36 to 1.59) for any coronary event, and 1.24 (1.10 to 1.40) for atherothrombotic outcomes.301,309 The third CORONA paper in this series analyzed a subset of 1,449 persons for whom researchers had measured soluble ST2.351 In this subgroup, each log unit increase in NT-proBNP was positively associated with the primary outcome (HR=1.59; 95% CI, 1.42 to 1.79).
NT-proBNP Levels Predicting All-Cause Mortality and Cardiovascular Morbidity
Table 28 describes study outcomes and followup period for articles assessing all-cause mortality and cardiovascular morbidity outcomes (n=26).
Twenty-six publications4,277,278,289,291,292,301-303,305,306,309,316,320,322,323,325,326,330,335,337-339,341,354,356 measured composite outcomes relating to all-cause mortality and cardiovascular morbidity (Appendix J Table J-39). Two publications4,330 did not report HRs or test statistics, so neither will be discussed further in this section. Five publications277,278,291,292,326 pertained to a single study in Scotland, two306,320 involved a single study in Italy, and two301,309 came from the CORONA study. The remaining papers reported on individual studies. For summarizing study characteristics and risk of bias, the publications301,306,326 with the largest sample sizes were chosen to represent all of the Scottish, Italian, and CORONA papers. Thus, this section reports on 18 unique studies.
The included studies took place in medical settings (e.g., HF clinics). Proportions of males and mean ages ranged from 0.65323 to 0.88303,339 and 49303,337 to 72289 years. One paper356 did not report either characteristic. Another study305 reported proportions of males across three different strata based on tertiles of sACE2 plasma activity: 0.68, 0.73, and 0.89.305 Sample sizes ranged from 71322 to 3,664;301 mean sample size was 608. Lengths of followup were between six and 12 months for four publications,303,322,337,354 13 to 24 months for 12 publications,277,278,289,291,292,320,323,325,326,335,338,339 and greater than 24 months for eight publications.301,302,305,306,309,316,341,356
Four studies303,322,337,354 followed participants for between six and 12 months. A 658 person303 study with a mean followup of six months reported an adjusted HR=1.06 (95% CI, 1.03 to 1.08) per unit change in NT-proBNP. The outcome was all-cause mortality or urgent cardiac transplant. The other four studies reported a mean followup of 12 months. The largest (n=504) 12 month study354 employed an outcome of death, heart transplant, or HF hospitalization and found adjusted HRs (CIs) of 0.45 (0.45 to 1.46) and 2.43 (1.39 to 4.28) when NT-proBNP was measured at baseline and six months respectively. A 91 person study337 measuring all-cause mortality or worsening HF reported an adjusted HR=1.001 (p=0.036) for each one unit change in NT-proBNP. A study322 examining all-cause mortality and HF hospitalization in 71 persons found no predictive value for NT-proBNP (HR=1.00; p=0.53).
Twelve publications277,278,289,291,292,320,323,325,326,335,338,339 reported 13- to 24- month followup periods. Five of these publications277,278,291,292,326 pertained to a single study in Scotland and two publications to a single study in Italy,306,320 while the remaining five reports each covered individual studies.323,325,335,338,339
The shortest followup in the 13 to 24 month category was a 13 month study338 of 210 persons; NT-proBNP values >581 pg/mL were associated with higher all-cause mortality, HF hospitalization, number of emergency department visits (HR=2.02; 95% CI, 1.08 to 3.78). A 17 month study325 of 290 participants evaluated log NT-proBNP in two separate multivariable regression models. This study found positive associations between each one-unit standard deviation increase in the peptide and a composite outcome of all-cause mortality, HF hospitalization, or urgent cardiac transplant (HR=1.9; 95% CI, 1.50 to 2.40 and adjusted HR=1.7; 95% CI, 1.30 to 2.30). Two 18 month studies also found positive associations between NT-proBNP and a composite outcome. The first study335 involved 82 persons who had a higher risk of death or HF hospitalization at an NT-proBNP cutpoint above 844 pg/mL (HR=4.50; 95% CI, 2.22 to 9.15). The second 18 month study323 recruited 166 persons and examined the same composite outcome; however, the authors only reported chi-square test statistics and p-values, so the magnitude of the positive association could not be assessed.
The five publications from the Scottish study277,278,291,292,326 reported on a rolling cohort of patients recruited between April 2001 and March 2004. Followups ranged from 13 to 22 months. The composite outcome was all-cause mortality or urgent cardiac transplant and multivariable regression analyses showed positive associations between higher NT-proBNP levels and incidences of the outcome. Since the analyses were repeated on an ever-increasing number of patients over time, median cutpoints varied in the publications. The last publication326 in this group reported a sample size of 182; NT-proBNP was positively associated with the outcome above 1,506 pg/mL (HR=2.7; 95% CI, 1.10 to 6.40).
The two publications from Italy appeared to include overlapping patients. The first study320 involved 142 patients followed for a mean of 20 months and the second306 contained 232 patients followed for a mean of 29 months. The combined outcome in both studies was all-cause mortality or HF hospitalization. Positive associations between peptide level and outcome were found in both studies. At a cutpoint ≥544 pg/mL, the adjusted HR=2.66 (1.24 to 5.71);306 at a cutpoint ≥3,283 pg/mL, the adjusted HR=2.16 (1.27 to 3.67).320
Two 24 month studies289,339 also found positive associations between NT-proBNP levels and composite outcomes. An investigation of 546 persons339 found a one log unit increase in NT-proBNP to be associated with higher event rates for all-cause death or heart transplantation (HR=1.42; 95% CI, 1.19 to 1.71). An 88-person study289 only reported a chi-square test statistic and p-value for the positive association between NT-proBNP and all-cause death or HF rehospitalization.
Seven papers301,302,305,309,316,341,356 besides the second Italian publication306 reported followups between 25 and 60 months. Two papers301,309 came from the CORONA study and the remaining four papers each pertained to an individual study. The CORONA papers reported on all-cause mortality or hospitalization for worsening HF at a mean of 32 months of followup. In both papers, each one-unit increase in log NT-proBNP was associated with increased mortality or hospitalization (HR=1.64 in both publications; 95% CI, 1.54 to 1.74 reported in one paper).309
The remaining five papers all contained results that were consistent with the above findings. A 30 month examination316 of 149 participants found various permutations of NT-proBNP to be statistically significantly associated with all-cause mortality or heart transplant. Permutations included the risk per 100 pg/mL increase in NT-proBNP, as well as assessments at cutpoints of ≥760 pg/mL, ≥1,164 pg/mL, and ≥1,460 pg/mL. Adjusted HRs ranged from 1.07 to 15.85. A 34 month study305 of 113 participants investigated a three-pronged outcome of all-cause mortality, cardiac transplant, or HF hospitalization and found an adjusted HR=1.55 (95% CI, 1.01 to 2.33) in participants above a cutpoint of 1,240 pg/mL. The same three-pronged outcome was used in a 37 month study of 136 persons,341 with an adjusted HR=2.12 (95% CI, 1.08 to 4.42) in persons at or above a cutpoint of 1,158 pg/mL. A 44 month investigation of 284 persons302 found a non-significant higher risk of all-cause mortality or first hospitalization with each one-unit increase in NT-proBNP (HR=1.03; 95% CI, 1.00 to 1.06; p=0.099). In a large (n=3,480) 49 month study involving all-cause mortality or cardiovascular hospitalizations, the adjusted HR=1.46 (95% CI, 1.37 to 1.57) per log unit increase in NT-proBNP.
NT-proBNP Levels Predicting Cardiovascular Mortality and All-Cause Morbidity
Table 29 describes study outcomes and followup period for articles assessing cardiovascular mortality and all-cause morbidity outcomes (n=3).
Three studies293,343,356 investigated the composite outcome of cardiovascular mortality and all-cause morbidity (Appendix J Table J-40). Participants were persons with HF who were two-thirds male;293,343 mean ages were 72343 or 57 years.293 In one study,356 the proportion of males and the mean age of participants was reported in two strata defined by a median NT-proBNP value of 339 pg/mL (below median: 37 percent, 70 years; above median: 41 percent, 74 years). Sample sizes were 106,343 204,293 and 3,474.356 Mean followups were 16343 or 50356 months, or a median of 36 months.293 Mortality and morbidity were defined as cardiovascular/HF death and hospitalization in all three studies.
In all three studies, higher levels of NT-proBNP were positively associated with the composite outcome of mortality and hospitalization. Adjusted HRs (CIs) were 1.02 (1.01 to 1.03) per 100 pg/mL in the 16 month study,343 1.28 (1.16 to 1.42) for NT-proBNP levels above 1,000 pg/mL in the median 36 month study,293 and 1.77 (1.43 to 2.20) for levels above 339 pg/mL in the large 50 month study.356 The 50 month study also reported other adjusted HRs: 1.44 (1.31 to 1.58) per log unit change in NT-proBNP; 1.13 (0.94 to 1.37) in the subgroup (n=1,737) with NT-proBNP >339 pg/mL; 0.57 (0.41 to 0.80) in the subgroup (n=1,737) with NT-proBNP <339 pg/mL. This study also found increasing point-estimate adjusted HRs for each quartile of NT-proBNP compared to the first quartile (Appendix J Table J-40).356
Design Characteristics of Studies
Six studies364-369 investigated the prognostic value of baseline BNP in persons with HF who received some type of surgery or dialysis (Table 30, and Appendix J Table J-41). Five studies364-368 were undertaken in stable HF populations and one study369 involved persons with acute decompensated HF. Surgeries included cardiac resynchronization therapy (CRT),366-368 cardiac resynchronization defibrillator therapy (CRT-D),364 or noncardiac surgery (e.g., abdominal, orthopedic).365 One study369 involved peritoneal dialysis.
Mean ages ranged from 61368 to 77 years.365 Percentages of males ranged from 41365 to 98 percent364 and mean lengths of followup ranged from 1365 to 18 months (Table 29, and Appendix J Table J-41).367 The smallest sample size was 32367 and the largest was 164.366 The mean sample size across all six studies was 87. Three studies used the Triage B-Type Natriuretic Peptide Test364,366,369 and three used the ADVIA-Centaur immunoassay.365,367,368
Risk of Bias
Overall risk of bias was low when the Hayden criteria were taken together for all of the studies (Figure 15, Appendix J Table J-41). Specific areas where risk of bias could be problematic included uncertainty over appropriate measuring of outcomes in four studies,365-368 as well as inadequate measuring and accounting for confounders in five studies.364-368
In stable HF populations, three studies366-368 examined the prognostic value of BNP, measured at baseline, following CRT. In two studies,366,367 effect sizes per unit change in BNP were close to unity. In one of these two studies, higher levels of BNP were associated with positive responses to CRT, (i.e., no HF hospitalization or improvement of at least 1 NYHA grade [95% CI, 1.001 to 1.003; p <0.01]).366 Conversely, in the second of these two studies, higher BNP levels were shown to be associated with HF hospitalization following CRT (adjusted HR=1.001; 95% CI, 1.000 to 1.002; p=0.024).367 In this second study,367 the authors found no association between higher BNP and all-cause mortality, although they did not provide numerical results to illustrate their finding. The last study368 involving CRT evaluated a composite outcome called HF progression, which included death, urgent transplant, HF hospitalization, or symptoms of HF progression. The adjusted HR per unit change in log BNP was 2.07 (95% CI, 1.19 to 3.62). See Appendix J Table J-42.
In the CRT-D study,364 persons with BNP levels at or above a cutpoint of 492 pg/mL had higher risks of all-cause mortality (adjusted HR=2.89; 95% CI, 1.06 to 7.88) or HF hospitalization (adjusted HR=4.23; 95% CI, 1.68 to 10.60).
The study evaluating the prognostic utility of BNP following noncardiac surgery reported a positive association between BNP levels and a composite outcome of all-cause mortality, acute coronary syndrome, or development/worsening HF.365 However, the authors reported a p-value (p=0.023), which does not show the magnitude of the association.
The lone study of 118 acute decompensated HF patients369 found a nonsignificant positive association between each one-unit change in BNP level and all-cause mortality following peritoneal dialysis (adjusted HR=1.38; 95% CI, 0.93 to 2.06).
Design Characteristics of Studies
Three papers370-372 (Table 31 and Appendix J Table J-43) pertaining to two trials, TOPCARE-CHD,370 CARE-HF,371,372 reported on the prognostic value of NT-proBNP following surgery in persons with stable HF. For TOPCARE-CHD,370 mean age was 62 years, 87% of participants were male, mean length of followup was 19 months, and sample size was 121 persons. The intervention under study was intracoronary infusion of bone marrow-derived mononuclear progenitor cells. NT-proBNP was measured using the Elecsys 2010.
In the CARE-HF papers,371,372 the age range was 55 to 75 years, 67% of participants were male, the median length of followup was 37.6 months, and 813 persons were studied. The intervention was cardiac resynchronization therapy and medical therapy compared to medical therapy alone. NT-proBNP was also measured using the Elecsys 2010.
Risk of Bias
Overall, risk of bias for the three publications was low (Figure 16, Appendix J Table J-43). However, a few specific questions on the Hayden instrument suggested potential issues with bias. Risk of bias was “uncertain” for appropriate measuring of outcomes in the case of all three articles. High risk of bias in the manner of measuring and accounting for confounders was possible in one paper.371 One publication372 was not designed to test the prognostic value of NT-proBNP.
In the TOPCARE-CHD paper,370 baseline NT-proBNP was shown to be positively associated with all-cause mortality. The adjusted hazard ratio was 7.2 (95% CI, 2.4 to 22.2) per one-unit increase in log NT-proBNP. All-cause mortality was also assessed in the CARE-HF papers: the adjusted HR for a one-unit increase in baseline log NT-proBNP was 1.56 (95% CI, 1.34 to 1.82);371 the adjusted HR in a time-dependent model examining log NT-proBNP measured three months after randomization was 1.62 (95% CI, 1.41 to 1.85) per unit increase372 (Table 31). See Appendix J Table J-44.
One of the CARE-HF papers371 also examined the prognostic value of one-unit changes in baseline log NT-proBNP on sudden death (adjusted HR=1.33; 95% CI, 1.11 to 1.60) and death from pump failure (adjusted HR=1.92; 95% CI, 1.58 to 2.34).
Comparing Prognostic Value of BNP and NT-proBNP in Decompensated and Stable Heart Failure Patients
Design Characteristics of Studies
Two publications373,374 included both decompensated and stable HF patients in their study populations. Both are part of the same population prospectively recruited in a hospital in Pisa Italy, with one article373 assessing a sub-population of the other (Table 32 and Appendix J Table J-45).374
Risk of Bias
The risk of bias was assessed based on the Hayden criteria58 as described in the methods section and Appendix E. Both articles373,374 (Figure 17, Appendix J Table J-46) scored well on assessment of study participation, study attrition and prognostic factors. Both articles adequately measured and defined the study outcomes. However, since one publication373 used a composite outcome comprised of mortality and morbidity, it was rated low on the question asking whether “composite outcomes were avoided.” Both publications373,374 failed to adequately measure and account for the important covariates, specified according to the a priori criteria set out (age, sex, body mass index and renal function). Analyses were appropriately conducted in both articles and both were adequately designed for prognostic study.373,374
Decompensated and Stable NT-proBNP
One of the articles374 looked at all-cause mortality over 32 months (Table 32), in a population of 400 people with a mean age of 69 years. For the overall group of patients, the authors reported a statistically significant HR (HR=2.04; 95% CI,1.25 to 3.36), indicating a positive association between higher values of log NT-proBNP and all-cause mortality. In patients with decompensated HF, log NT-proBNP was slightly above 1.0 (HR= 1.01; 95% CI, 1.00 to 1.01; p=.060), yet confidence intervals included the null value. Multivariable results for stable HF patients were not reported in the article.374 See Appendix J Table J-46.
The other article373 examined a composite outcome of all-cause mortality and cardiovascular morbidity over 22 months, in a population of 313 individuals with a mean age of 69. The publication performed multivariable analyses on varying cutpoints. In patients with stable HF, NT-pro-BNP >1,129 pg/mL (HR=2.84; 95% CI,1.44 to 5.62) was a significant predictor of the end point in multivariate analysis. Likewise, in patients with decompensated HF, NT-pro-BNP >3,430 pg/mL was significant at HR=2.06 (95% CI, 1.16 to 3.67). For both stable and decompensated groups combined, NT-pro BNP >1,492 pg/mL was a significant predictor of all-cause mortality and cardiovascular morbidity (HR=2.94; 95% CI, 1.83 to 4.72).373
Key Question 4. In HF populations, does BNP measured at admission, discharge, or change between admission and discharge, add incremental predictive information to established risk factors for morbidity and mortality outcomes?
All studies eligible for KQ3 were further screened for appropriate statistical methods used to demonstrate additional incremental predictive value of adding BNP /NT-proBNP to prognostic models predicting future outcomes of mortality, morbidity, and composite outcomes. Incremental predictive value could be evaluated in a number of ways including the use of discrimination, calibration, or reclassification statistics. An abbreviated summary of these complex statistics follows to guide the reader to interpret the study findings described below.
The c-statistics or c-index, which is one of the more frequently reported incremental value statistics, is a measure of discrimination; it indicates how variables improve the discriminatory ability of prognostic models for risk prediction between the groups of individuals classified as high risk and low risk. The accuracy, or calibration, of risk prediction is also an important measure of a risk marker. The calibration of a risk predictor can be measured by comparing the predicted frequency of events with the observed frequency and this is determined by assessing the goodness of model fit (Hosmer-Lemeshow goodness-of-fit test). The likelihood-based measures (such as global chi-square or LR chi-square and log LR) show whether the addition of BNP/NT-proBNP, or other markers, to base models provides a better model fit and increase in predictive value for mortality or morbidity. Measures of risk classification (including net reclassification index (NRI) and incremental discrimination improvement (IDI) index) assess the degree to which the addition of BNP/NT-proBNP improves discrimination between groups of individuals classified with and without the test. NRI and IDI are considered to be improvements over measures of discrimination (AUC and c-statistic), calibration (goodness-of-fit, Hosmer-Lemeshow statistic), and global model-fit statistics (likelihood-based measures).44
From 183 eligible studies in KQ3, 39 publications used methods that would allow assessment of the incremental value of adding BNP or NT-proBNP when predicting subsequent outcomes. From these 39 publications, two studies2,247 reported that they undertook statistical computations but did not present any data for incremental value. Additionally, 15 studies included BNP in the base prognostic model,106,196,210,212,273 in the NT-proBNP predictive model,282,303,316,339,343,348,352,362,375 or both assays in the model.217 Including these assays in the base model does not allow assessment of predictive incremental value for BNP/NT-proBNP. The study findings from the remaining 22 publications (12 unique studies [cohort of patients])3,187,193,198,205,251,256,283,286,301,306,309,320,329,340,344,349,353,357,360,373,376 are presented in grouped sections accounting for incremental value estimates in studies with decompensated or stable populations with HF. See Appendix K. KQ4 Evidence Set.
Evidence for Incremental Value of BNP and NT-proBNP in Decompensated Heart Failure Patients
There were seven publications (6 studies) that included patients with decompensated HF and evaluated the incremental value of admission BNP3,187,193,198,205 and admission NT-proBNP.251,256 One study3 evaluated both BNP and NT-proBNP but reported results only for BNP. One study198 had overlapping samples of consecutive patients recruited from the same center; findings from both publications are reported even though the cohorts overlap and are considered a single study.
Design Characteristics of Studies
From the five3,187,193,198,205 publications evaluating BNP in acute decompensated populations, only one recruited participants from emergency settings,3 while the other four recruited participants from among persons admitted to hospital.187,193,198,205 All BNP studies were cohort designs that included relatively equal proportions of men and women. One BNP study included only patients with NYHA class III and IV severity.187 Sample sizes of BNP studies varied from 568 to 1,111 subjects. All studies evaluated BNP/NT-proBNP levels at admission and did not assess any serial or discharge from hospital levels.
Table 33 shows the outcomes and time intervals of studies who evaluated and presented data on incremental value of BNP/NT-proBNP. The studies evaluating the incremental value of BNP as a predictor evaluated only mortality related outcomes. Time intervals for outcome prediction varied from 3 months to 12 months in these studies the studies were undertaken in Greece,187 Spain,198,205 the United States,193 and multinational settings.3 The assays used in these BNP studies included the Abbott AxSym,187 the ELECSYS-proBNP,3,205 the TRIAGE-BNP,193 and the ADVIA-Centaur.198 Other study characteristics are described in Appendix K Tables K-1 and K-2.
Two studies evaluated NT-proBNP in patients with decompensated HF presenting to the emergency department in Spain251 or admitted to hospital in Denmark.256 The Elecsys 2010 analyser assay was used in both studies to assess NT-proBNP levels. The mean age of the samples and proportion of males are described in Appendix K Table K-3.
Risk of Bias
Figure 18 (also Appendix K Table K-4) shows the distribution of risk of bias across the five BNP studies and single NT-proBNP study. Generally, these six publications were at low risk of bias. Studies tended to be problematic with respect to describing and accounting for confounders,187,198,205 and with appropriate measurement of the outcome,187,193,205 or unclear outcome measurement.3,198
The single study that evaluated NT-proBNP in decompensated patients251 was the only publication within that group that rated adequate for all criteria; however, this study also had the smallest sample size (n=107) of the studies with decompensated patients.
BNP Levels Adding Incremental Value in Predicting Risk for Mortality
None of the BNP publications included in this group undertook internal or external model validation computations. Only mortality outcomes were evaluated in these studies. Note that these studies evaluated admission BNP levels and none evaluated the incremental value of discharge or change in BNP levels. None of the studies overlapped with respect to the lengths of followup, which varied from 31 days to 12 months (see Table 33).
Four publications assessed all-cause mortality3,193,198,205 and two assessed cardiovascular mortality187,205 in studies using BNP levels as the predictor. Appendix K Table K-1 shows the primary findings of these studies evaluating the incremental value of using BNP levels to predict all-cause mortality.
Two studies used measures of reclassification and both evaluated all-cause mortality in the short-term, at 3 months,3 and 6 months.205 Both studies estimated the IDI index, which shows how BNP (or other markers) improves the level of discrimination between groups of individuals classified as high or low risk for the outcome (in this case, mortality). Comparison across these two studies is limited as one publication3 used a cutpoint of 350 pg/mL as the threshold in the model and the second study205 used BNP (per increase of 1 interquartile range (IQR)). Nunez et al.205 showed that the base model with BNP had a lower IDI than the base model with tumor marker carbohydrate antigen 125 (CA125). When both BNP and CA125 were added to the base model, the greatest percentage increase in IDI was achieved. This study also evaluated two other mortality outcomes, cardiovascular and HF, and when comparing all three, all-cause mortality showed the largest percentage improvement in IDI for the base model with BNP added (1.51% for all-cause vs. 1.23% for cardiovascular or 0.95% for HF mortality). These data suggest that there may be differences in risk prediction by type of mortality outcome, but also that BNP combined with CA125 had the best level of discrimination. Maisel et al.3 used two different base models but reported incremental value for log transformed BNP combined with log transformed midregional pro-adrenomedullin (MR-proADM). In this study, the combined model versus BNP alone, showed an NRI of 39 percent change, reflecting the percentage of individuals in the population who are correctly reclassified into clinically meaningful prespecified risk categories (three probability groups for risk: less than 6%, between 6% and 20%, and greater than 20%). An IDI of 5.24 percent was achieved reflecting this degree of improvement in discrimination. In summary, for short-term prediction of all-cause mortality, these two studies would suggest that NT-proBNP has incremental predictive value, but to a lesser degree than when combined with CA125205 or MR-proADM.3 One of these studies205 was at high risk of bias with concerns about followup, description of included covariates, and confounders.
Two studies evaluated the incremental value of BNP for predicting all-cause mortality in the longer term, at 9 months198 or 12 months.193 One study198 recruited subjects from emergency departments and followed them for a median of 9 months; the Harrell's c-statistic was greater in the prognostic model that included admission BNP (continuous and for quintiles) compared to the same model without BNP (c-statistic=0.801 vs. 0.781) for predicting all-cause mortality. The second study,193 which included patients admitted to hospital, compared the incremental prognostic value of BNP and a number of different markers, showing increases in the c-statistic when admission BNP was added to the base model, as well as for the addition of C-reactive protein (CRP) and troponinT (TnT) (Appendix K Table K-1). Similarly, the IDI was 4.3 percent (p=0.001) and NRI was 16.2 percent (p=0.003) when BNP alone was added. However, in this study both the c-statistic and IDI and NRI estimates showed slightly greater values for CRP and TnT relative to the incremental value of BNP; the greatest increment was obtained when all three markers were added to the base model. In summary, for longer term prediction of all-cause mortality of 9 and 12 months, these two studies would suggest that BNP adds incremental value. One study193 suggests that BNP is not superior to CRP and TnT with respect to 9incremental predictive value for all-cause mortality.
Two studies187,205 that included patients admitted to hospital evaluated the incremental value of BNP and other markers for predicting cardiovascular related mortality. One study187 evaluated cardiovascular mortality at 31 days and showed incremental value in the c-statistic when admission BNP was added to the base model. The incremental value of BNP was compared to CRP and to cardiac troponin I, and the c-statistic values suggest that BNP showed the largest increase relative to these other markers; however, it is not clear if these are significantly different. A second study205 evaluated both cardiovascular and HF mortality at 9 months; using IDI estimates this study205 showed that BNP provided incremental predictive value for cardiovascular and for HF mortality but to a lesser magnitude for the latter mortality (Appendix K Table K-2). This study also compared the incremental value for three types of mortality and BNP relative to CA125. A similar trend was seen across the three mortality outcomes; the base model with BNP had a lower IDI than the base model with CA125. However, when both BNP and CA125 were added to the base model, the greatest percentage of IDI was achieved. Cardiovascular mortality showed the largest IDI when the base model was combined with both BNP and CA125 (IDI=3.65 vs. 3.45 or 2.47%). In summary, these two studies would suggest that BNP adds incremental value in predicting cardiovascular mortality in the short term (31 days) and longer term (9 months). However, both these studies were at high risk of bias with respect to adequacy of measurement of the outcome, and dealing with important confounders.
BNP Levels Adding Incremental Value in Predicting Risk for Morbidity
None of the studies using BNP levels as predictors of outcome assessed the incremental value for outcomes of morbidity.
BNP Levels Adding Incremental Value in Predicting Risk for Composite Outcomes
None of the studies using BNP levels as predictors of outcome assessed the incremental value for composite outcomes.
NT-proBNP Levels Adding Incremental Value To Predicting Risk for All-Cause Mortality
Two studies251,256 evaluated the incremental prognostic value of NT-proBNP in decompensated patients. One study251 undertook discrimination, calibration, reclassification, and internal validation computations to assess the incremental prognostic value of NT-proBNP in subjects admitted to hospital with decompensated HF. All-cause mortality was the predicted outcome at a median followup of 22 months. The discrimination statistic showed that when NT-proBNP was added to the model, the value increased but was not statistically significant (Appendix K Table K-3). For calibration, the Hosmer-Lemeshow statistic decreased (base model 0.56 to 0.29), suggesting that the goodness-of-fit deteriorated when NT-proBNP was added. Considering reclassification statistics, this study considered the integrated discrimination of improvement (IDI) based on the inclusion of several markers in the base model. The inclusion of NT-proBNP alone to the base model failed to show a statistically significantly improvement in the IDI (2%, p=0.532 vs. base model). The highest improvement in the IDI was achieved when the NT-proBNP was combined with other markers in the form of a multimarker risk score, based on optimal cutpoints, using an ROC analysis, and showed an IDI equal to 25 percent (p=0.004) relative to the base model and IDI equal to 22 percent (p=0.003) compared to the base model with NT-proBNP alone (Appendix K Table K-4).
The second study256 evaluated only the goodness of fit to the model when NT-proBNP was added and showed it added incremental value for predicting all-cause mortality at 6.8 years and was statistically significant.
NT-proBNP Levels Adding Incremental Value in Predicting Risk for Morbidity
None of the studies using NT-proBNP levels as predictors of outcome assessed the incremental value for outcomes of morbidity.
NT-proBNP Levels Adding Incremental Value in Predicting Risk for Composite Outcomes
None of the studies using NT-proBNP levels as predictors of outcome assessed the incremental value for composite outcomes.
Evidence for Incremental Value of BNP in Stable Heart Failure Patients
Added Value of BNP to Prognostic Risk Prediction
There were no studies that evaluated the incremental value of adding BNP in chronic HF patients.
Added Value of NT-proBNP to Prognostic Risk Prediction
Design Characteristics of Studies
The majority of these studies were publications based on related patient cohorts from Italy,306,320,349,373,376 from Spain,353,357 from Europe,340,344 and from the Controlled Rosuvastatin Multinational Trial in Heart Failure (CORONA) with subjects recruited across Europe.301,309 The remaining studies were conducted in Denmark,283,329,360 and from multinational sites (16 countries).286
Three publications were based on randomized trials from the CORONA trial301,309 and Valsartan Heart Failure Trial (Val-HeFT);286 both studies had large sample sizes ranging from 3,342 to 3,916. The remaining studies were prospective cohort designs and sample sizes varied from 107 to 891 subjects. All 15 studies used the ELECSYS -proBNP Immunoassay to evaluate the NT-proBNP.
Table 34 shows the length of followup and outcomes evaluated in the studies. The majority of studies evaluated mortality outcomes with fewer studies evaluating morbidity and composite outcomes. Appendix K Tables K-5 to K-8 detail the mean age and percentage of males for each estimate of incremental value of NT-proBNP.
Risk of Bias
Figure 19 (also Appendix K Table K-9) shows the proportion of studies meeting various criteria assessed for risk of bias. Appendix E shows the individual study ratings for risk of bias. Almost all studies clearly defined their source of the population and this was representative of our target population. Similarly, all studies provided adequate description of their statistical analyses and used adequate designs to address this question of prognosis. Four of five related studies306,320,373,376 had problems with reporting which confounders were measured and how these were dealt with within the analysis, which accounted for the majority of studies with problems in this criteria.
NT-proBNP Levels Adding Incremental Value To Predicting Risk for All-Cause Mortality
Nine publications286,301,309,320,329,344,353,357,360 reported on the incremental value of adding NT-proBNP to the model and predicting all-cause mortality at time intervals that varied from 12 months, to 37 months. All but one study360 presented assessment of the incremental value of NT-proBNP with respect to assessing the goodness of fit; fewer studies used the c-statistic,309,353,357,360 the Hosmer-Lemeshow statistic,353,357 IDI,353,357,360 and validation methods.353,357
A single study344 at low risk of bias evaluated the incremental value of log10 transformed NT-proBNP for predicting all-cause mortality at 12 months and showed no statistical difference (p=0.32) in the AUC by adding either NT-proBNP or midregional proadrenomedullin (MR-proADM). However, when either of these two biomarkers were added to the base model, the prognostic value of the base model significantly increased (p=0.038, p=0.0001). When MR-proADM was already included in the base model, the addition of NT-proBNP was significant; in contrast when NT-proBNP was in the base model and MR-proADM was added, there was no incremental value.
Four publications286,301,309,320 evaluated incremental value for predicting all-cause mortality at approximately 24 months; subjects in all studies were predominately male subjects (>70%). One study320 with a smaller sample size (n=142) showed that adding NT-proBNP increased the chi square value to the base model + tricuspid annular plane systolic excursion + ejection fraction. A study286 from the Val-HeFT cohort (n=3,916) was at low risk of bias and showed that NT-proBNP added to the base model improved predictive ability at 23 months for all-cause mortality. Two related publications301,309 evaluating the CORONA cohort do not state the followup time interval but based on other CORONA publications this is reported as 24 months (mean or a median of 33.4 months). Both publications report the same number of events but differing sample sizes at risk. The base models differ between the publications but both studies report increases in the chi square value when adding the log transformed NT-proBNP to the base model. One of these publications309 shows the value of the c-statistic increases to 0.719 when NT-proBNP is added to the base model relative to an increase to 0.684 when lipids alone are added to the base model. The findings from these four publications with relatively large sample sizes, suggest that there is added value in using NT-proBNP to predict all-cause mortality at approximately two years. However, the model covariates differed between studies, as did the NT-proBNP cutpoints.
Four studies evaluated predictive ability of NT-proBNP at 30 months329,360 and 33.4353,357 months. Two publications evaluated the same cohort of patients (n=891, n=876) and the same base model, but one study353 compared NT-proBNP relative to ST2 receptor cardiac biomarker and the other publication357 compared the logNT-proBNP relative to high sensitivity cardiac troponin T (hs-cTnT). Both publications show that the c-statistic increases when NT-proBNP/logNT-proBNP is added to the base model and is statistically significant (p=0.040, p=0.017). Both publications also show that when the comparator cardiac marker (ST2 or hs-cTnT) are added to the base model the c-statistic increased and was statistically significant. When NT-proBNP is added to the model combined with either of these two cardiac markers, the c-statistic increased and was statistically significant; however, the c-statistic value does not appear to differ by a large amount compared to the value where NT-proBNP alone or the other markers alone were added (Appendix K Table K-5). The other two studies showed that NT-proBNP added to the base model significantly improved model fit329 and significantly improved the c-statistic relative to base model360 for predicting all-cause mortality at 30 months. In summary, the studies evaluating longer term all-cause mortality would suggest NT-proBNP adds incremental value to predicting 30 and 34 month all-cause mortality. When incremental predictive value of BNP is compared to Hs-cTnT and ST2, the relative contribution appears similar but the greatest increment was shown when NT-proBNP was combined with either of these two markers and the base model.
NT-proBNP Levels Adding Incremental Value To Predicting Risk for Cardiovascular Mortality
One study340 used both the c-statistic and the LR chi-square for the outcome cardiovascular mortality at 12 months; both computations showed that the addition of NT-proBNP added incremental value (Appendix K Table K-6). However, in this study the highest incremental values occurred either when NT-proBNP and C-Terminal Pro-Endothelin-1 (CT-proET) were combined (global chi-square: 94.3 vs. 77.0, p <0.0001). When using the c-statistic, NT-proBNP added to the base model showed a greater AUC relative to that of the addition of CT-proET (c-statistic=0.780 vs. 0.774). A second study376 computed a LR chi-square and showed that the addition NT-proBNP to the base model yielded a significant increase in predictive value for cardiovascular mortality (global chi-square: 119.30 vs. 105.54, p <0.0001). The third study309 compared two types of mortality (all-cause and HF), but showed a similar trend across both outcomes; the base model without NT-proBNP had a lower discriminatory ability for risk prediction than the base model with NT-proBNP. However, HF mortality showed the highest improvement in c-statistic for the base model with NT-proBNP that was significant (p=0.0002).
NT-proBNP Levels Adding Incremental Value To Predicting Risk for Morbidity Outcomes
Two studies283,286 evaluated morbidity outcomes from 12 to 24 months. A study283 of small sample size (n=150) at low risk of bias evaluated the morbidity outcome of NYHA class change (same or worsening) at 12 months; the log LR increased and was statistically significant ( p =0.001) when NT-proBNP was added to the base model. Another study286 evaluated HF hospitalization at 23 months and also showed incremental value of NT-proBNP as the log LR increased and was statistically significant ( p =0.001) (Appendix K Table K-7).
NT-proBNP Levels Adding Incremental Value To Predicting Risk for Composite Outcomes
Six publications evaluated the incremental value of adding NT-proBNP predicting five different composite outcomes for time intervals varying from 22 to 37 months. The composite outcomes evaluated included: (1) cardiovascular mortality or nonfatal myocardial infarction (MI) or nonfatal stroke,301,309 (2) atherothrombotic endpoint (fatal or MI, or fatal or nonfatal non-hemorhagic stroke),301,309 (3) coronary events (sudden death, fatal or nonfatal MI, coronary revascularization, ventricular defibrillation by an implantable device, resuscitation from cardiac arrest, or hospitalization for unstable angina),301 (4) death/ all-cause death or worsening HF;301,306,309,373 and, (5) mortality and morbidity unspecified;286 cardiac mortality and HF hospitalization349 (Appendix K Table K-8). Two publications301,309 evaluated prediction of four composite outcomes (some events overlapping) at mean followup of 24 months in the CORONA cohort of patients (n=3,664); all four composite outcomes showed that the addition of NT-proBNP improved the base model global fit and was statistically significant. Two related publications306,373 with overlapping sample of subjects from the same patient registry showed that the addition of NT-pro BNP added incremental value in predicting all-cause mortality and HF hospitalization at 22 and 29 months. Another study349 also showed that NT-proBNP added incremental value in predicting cardiac mortality and HF hospitalization at 37 months. In summary, the six publications that evaluated five different composite outcomes that combined mortality and morbidity events all suggest that NT-proBNP adds incremental value in predicting these outcomes from 22 to 37 months.
Key Question 5. Is BNP or NT-proBNP measured in the community setting an independent predictor of morbidity and mortality outcomes in general populations?
Seven studies377-383 from 215 citations screened at full text were eligible for inclusion in this section of the systematic review. Defining a “general” population was not straightforward and after consultation with the Technical Expert Panel (TEP), a general population was defined as one randomly selected from a community setting where no specific inclusion or exclusion criteria were specified. Thus, if a study excluded patients with any particular disease (i.e., exclude those at risk of HF) or a particular biomarker result (i.e., exclude those with high urinary excretion of albumin), this was not defined as a general population.
These general population criteria were implemented to best represent the population as a whole that has no predefined natriuretic hormone level. See Appendix L. KQ5 Evidence Set.
Design Characteristics of Studies
Populations were included in the systematic review only if they were unselected for any disease or risk factor for disease. The populations included as general populations were a very elderly population selected at age 85 years of age378 or from population-based cohorts,377,379-383 and many of these samples would be considered to be weighted in favor of the elderly population (Appendix L Table L-1). One study used only male subjects380 and the others recruited from both sexes with varying representation (28-50% male subjects). A total of 16,507 individuals were included in the seven studies. The smallest study included 274 individuals378 and the largest 5,447382 (Appendix L Table L-1). The length of followup ranged from 3.5378 to 13.8377 years.
All seven studies measured NT-proBNP. No studies used BNP.
In three studies, no direct comparison measurement was used.378,381,382 Three studies compared multiple cardiovascular risk markers377,380,383 but these studies did not select identical comparison markers. The following markers were used for comparison: high-sensitivity C reactive protein,377,380 troponin T,379 troponin I,380 copeptin,377 midregional pro-adrenomedullin,377 midregional pro-atrial natriuretic peptide,377 cystatin C,377,380 serum creatinine,383 and IGF-1.383 All of these markers have some association with cardiovascular disease (CVD) reported in the literature.14
Several primary outcomes were reported for these studies. All-cause mortality was used in three studies.378-380 Sudden cardiac death was used by one study.382 A combined cardiovascular endpoint was used by one study.381 One study considered the onset of AF or HF as the primary outcome.377 One study383 used cardiovascular mortality as a primary outcome and two studies379,381 used death from CVD as a secondary outcome.
By definition, all of these studies were set in the community with no selection criteria. These papers represent a true general, unselected population.
Risk of Bias
The populations for this group of studies were all suitably defined and described, and represent the population of interest. There is low risk of bias for population description and selection (Figure 20).
Most of the papers have complete data or describe attrition in a suitable manner. Two papers were not clear about the adequacy of the completeness of followup381,382 and one of these did not describe the completeness of followup.381 Overall, the risk of bias is low for study attrition.
The prognostic factors were fairly well addressed. NT-proBNP was appropriately defined and measured in all seven papers. The other prognostic factors were well defined and measured in all but two papers.378,383 The indeterminate results or missing data were less well addressed by a few papers.377,378,381,383 There is low risk of bias for the NT-proBNP factor and moderate risk of bias for the other prognostic factors.
Outcome measurement was also done correctly by most studies. Fairly stringent criteria for obtaining accurate data were set, and only one study did not meet these criteria.382 However, the authors did address this in their methods and the risk of bias is low for the outcome measurements in this section.
Confounding was considered by all of the papers according to our criteria and the risk of bias is low for confounding. The use of appropriate covariants was appropriate in these seven papers. Studies were expected to consider, age, sex, BMI, and renal function as important covariants. One study did not use BMI but did use waist-to-hip ratio as a covariant.381
Analysis was appropriately conducted in all the studies. All the study designs were observational cohorts, and the question posed for the reports most often looked at the predictive value of NT-proBNP in the population described. All reports used stored samples from the population studies to measure NT-proBNP and the other biomarkers of interest.
In summary, the risk of bias in this group of papers is low.
All-cause mortality was the outcome in three studies378-380 and in all three there is an increasing adjusted HR with increasing NT-proBNP measured by tertiles,378 by increases of 1 standard deviation380 and by log(NT-proBNP).379 The adjusted HR shown in Appendix L Table L-1 demonstrates the clear relationship between baseline NT-proBNP and all-cause mortality. The relationship appears to be log-linear in nature.
Sudden cardiac death has increasing HR across the quintiles of NT-proBNP and an adjusted HR=1.9 (95% CI, 1.7 to 2.1) for the natural logarithm (ln) ln-NT-proBNP.382
Cardiovascular death has a significant adjusted HR for log(NT-proBNP)/SD381 and log(NT-proBNP).379 When a cutpoint of 100 pg/mL was applied to a population older than 65 years of age an adjusted HR=1 (95% CI, 1 to 1.001) was reported with a p value of 0.001.383 However, in a model that was adjusted for known baseline CVD, the adjusted HR became nonsignificant (HR=1.61 (95% CI, 0.79 to 3.28)).379
Onset of AF was associated with ln-NT-proBNP in a model including conventional risk factors (adjusted HR=1.45 (95% CI, 1.28 to 1.65)) but not a model that included midregional pro-atrial natriuretic peptide and CRP.377
Onset of incident HF was associated with ln-NT-proBNP in the models investigated that included other markers of cardiac risk.377
Key Question 6. In patients with HF, does BNP-assisted therapy or intensified therapy improve outcomes compared with usual care?
Design Characteristics of Studies
All studies were RCTs with the objective of determining whether patients treated for HF guided by BNP or NT-proBNP improves outcomes compared to usual care. There were nine studies that fulfilled this objective.4,5,53,384-389 The term usual care includes the terms standard of care, clinically-guided, symptom-guided, or control group. One study used a congestion score strategy compared to BNP-guided therapy.387 Another study4 was a three-arm trial with an additional multidisciplinary group, but only the usual care and NT-proBNP arms are compared for this systematic review.
Inclusion criteria included age and characteristics of HF patients with regards to severity, therapy, and concentration of BNP or NT-proBNP (Table 35). Age was specified in five studies and included >18 years,384,388,389 >20 years,386 and >60 years.53 All except one study385 specified the severity of patients with HF by NYHA classification levels II-III,388 II-IV,5,53,384,386,389 or III-IV.4,387 The most frequent LVEF cutpoint used was ≤40 percent,4,384,386,389 but other studies had values of ≤35 percent,387 ≤45 percent,388 <50 percent,5 and two studies did not require this measure.53,385 The HF patients were required to be stable in two studies5,388 and decompensated (or worsening) in five studies.4,385-387,389 Other criteria included HF diagnosis ≤3 months384 and previous admission for HF.53,384 HF therapy was a criteria in four studies and included angiotensin converting enzyme (ACE-I) or angiotensin receptor blocker (ARB),384,388,389 aldosterone antagonists (AA),389 digoxin,384,388,389 diuretic,384,388,389 beta-blocker,388 or be on stable medications388 or standard therapy,53,388,389 but without specifically defining the therapy. Elevation of BNP or NT-proBNP was required in four studies.5,53,385,389
All studies except one4 specified exclusion criteria (Table 35). Medical history exclusion criteria included cardiac, hepatic, pulmonary, and renal problems. Cardiac problems included acute coronary syndrome,53,384,386-389 unstable angina,53,384 aortic or mitral stenosis,5,384,386 cardiac transplantation,5,390 life-threatening arrhythmias,5,385 cardiac transplantation,5,390 open heart surgery,5,389 revascularization,53,386 revascularization indicated or expected,53,386,389 surgical or invasive intervention,385 or valvular disease requiring surgery.53 Hepatic disease was an exclusion criteria in two studies,384,389 and hepatic cirrhosis in another study.388 Pulmonary disorders included asthma,388 COPD,385,386,388 pulmonary hypertension,385 and severely decreased pulmonary function.389 One study required dyspnea not mainly due to HF as an inclusion criteria.53 Seven studies excluded patients if the creatinine concentration was above 200 to 309 μmol/L,5,53,384,386-389 but one study required participants to have renal disease,385 and another study4 did not have renal disease as a criteria for inclusion or exclusion. Hemodialysis or peritoneal dialysis were exclusion criteria for two studies.385,387 Two studies had medications as exclusion criteria. One study excluded patients on beta-blockers or had a contraindication for this medication.384 Another study5 excluded patients who were on standard HF therapy.
Other exclusion parameters included BMI >35 kg/m2,53 life expectancy for noncardiovascular diseases <1 year385,386 or 3 years,53 or limited life expectancy (time not specified).389 Patients were also excluded if participating in another study or unable to give signed consent,53,386,389 as well as being unable to follow the study schedule.389
Table 36 describes baseline characteristics for the BNP/NT-proBNP group. The studies were carried out between 2002 and 2010 for a minimum of 3 months up to a maximum of 18 months. There were seven multicenter studies including three to 45 sites with a minimum of 41 patients up to a maximum of 499 patients. The total number of patients included for all nine studies was 2,104.
Four studies measured BNP384,387-389 and five studies measured NT-proBNP.4,5,53,385,386 The BNP test was performed on a point-of-care device, whereas all NT-proBNP measurements were performed on an automated clinical analyzer. One study did not blind patients to their NT-proBNP values.389 All other studies except for one388 did not explicitly say whether patients were blinded to their BNP or NT-proBNP test result.
The study with the youngest patients had a mean age of 59 years (IQR 50 - 70),387 whereas the study with the oldest patients had a mean age of 71.6 (±12.0) years.385 Three studies had a low percentage of male participants: 24 percent,4 33.3 percent,384 and 38 percent.389 The percentage of males in the other studies was 55.0 percent385 to 88.2 percent386 with an average of 62.7 percent. Race was reported in only one study (87% Caucasian).386 Six studies4,5,53,385,388,389 recruited patients from European countries suggesting race to be mostly Caucasian.
Heart Failure Characteristics
The severity of HF by NYHA class was reported in five of nine studies as the number of patients in each class, and in one study384 only the mean NYHA class was provided (2.6±0.7). The highest proportion of patients in three studies5,385,386 was in the NYHA II class, whereas two studies had more NYHA III class patients.53,389 The mean LVEF was as low as 20 percent387 to as high as 34.9 percent391 and reported as preserved, or reduced in one study.4 The most common cause of HF was ischemic in four studies4,53,386,388 in about half of the patients. The duration of HF388 and a congestive score387 were other criteria recorded.
B-Type Natriuretic Peptide Concentration
The baseline concentration of BNP was not reported in one of the four studies that measured this natriuretic peptide.388 The mean concentration was higher in one study (808±676 pg/mL)389 by about 40 percent compared to the other two studies.384,387 For NT-proBNP, the baseline concentrations were similar, from 2,216 pg/mL to 2,998 pg/mL.
Various physiological measures were reported in all but one study384 and included BMI,53,386,389 blood pressure,4,5,385,386 heart rate (all except one389), jugular vein distension,5 lower extremity edema,5 mitral valve regurgitation,385 murmur,5 pulmonary edema,5 QRS duration,5,385,388 Third Heart Sound (S3) and Fourth Heart Sound (S4) gallop,5 and weight.388
All except one study387 reported at least one item for medical history. These included AF,5,53,385,386 arthritis,385 coronary artery bypass graft,53 cancer,4,5,53,385 COPD,385,386 coronary artery disease,388 diabetes mellitus (all studies reported this disease), dyslipidemia,388 hypertension,4,5,53,385,386,388,389 kidney disease,53 myocardial infarction,4,5,385,386 percutaneous coronary intervention,385 smoking (current, former or never),385,386,388 stroke or transient ischemic attack,4,385 valve replacement,385 or ventricular tachycardia.386
Heart Failure Therapy
Medication use was reported in all studies. Comparison of the main HF medications among studies is illustrated in Figure 21. This figure shows that at least 70 percent of the patients in all studies were taking an ACE-I or ARBs, beta-blocker (except in one study where no patients were taking this medication),384 and diuretic. These included ACE-I,5,386,387,389 of which close to 75 percent of participants were taking. Almost all patients in studies reporting ACE-I or ARB were taking one or the other medication.4,53,384,385,392 No patients in any study were taking both ACE-I and ARB. Two studies reported patients taking ACE-I or ARB with a beta-blocker.385,387 One study reported patients taking ACE-I or ARB with spironolactone.4 Aldosterone agonists were reported in seven studies and in most studies, about half of the patients were taking this medication.4,5,53,385,386,388,389 ARB alone was reported in four studies with 10.7 percent to 35 percent of patients taking this medication.5,386,387,389 Beta-blockers were taken by almost all patients in all except one study384 where the objective was to titrate beta-blockers using BNP-guided therapy compared to usual care. Beta-blockers were taken by at least 76.1 percent and up to 99 percent of all patients.388 Digoxin was reported in four studies of which one study384 had all patients on this medication. In the other studies5,386,389 the percent of patients taking this medication was 14 percent to 29.3 percent. Loop diuretics were taken by 83 percent to 100 percent of all study patients. Only one study reported patients taking a thiazide diuretic.386 Hydralazine386 and nitrates53,386 were taken by some patients.
Quality of Life
Three studies had baseline quality of life (QOL) data based on four types of questionnaires. The questionnaires included the Duke Activity Status Index,5 Kansas City Cardiomyopathy Questionnaire (KCCQ),5 Minnesota Living with Heart Failure Questionnaire,53,384 and the Short Form 12.53
Other Biochemical Tests
Creatinine concentration was reported in all but one study.384 The concentrations were between 92±34 μmol/L388 and 121 μmol/L (IQR 98 to 157)385 with one study reporting the number of patients with a value >177 μmol/L.4 The eGFR was reported in one study (61.4±20.9 mL/min/1.73 m2).389 Hemoglobin,385 potassium,385,386 sodium,385-387,392 and urea385-387 were the other biochemical tests reported.
Differences Between the Two Treatment Arms
There were few significant differences in the reported characteristics between the usual care group and BNP/NT-proBNP treated group (BNP/NT-proBNP group). They included percent male (76 in BNP/NT-proBNP group and 66 in usual care group),5 LVEF percent (29.9 in BNP/NT-proBNP group and 31.8 in usual care group),388 mean (SD) blood pressure (diastolic (mmHg) 64(±9) in BNP/NT-proBNP group and 67(±9) in usual care group; systolic (mmHg) 108(±15) in BNP/NT-proBNP group and 112(±16) in usual care group),386 percent coronary artery disease (55 in BNP/NT-proBNP group and 67 in usual care group),386 percent current smoker (39 in BNP/NT-proBNP group and 53 in usual care group),388 and percent transient ischemic attack (five in BNP/NT-proBNP group and 15 in usual care group).385
Table 37 outlines the treatment protocols for each study for both the BNP/NT-proBNP group as well as the usual care group. Three studies chose a specific target concentration for the BNP/NT-proBNP group. For the study388 using BNP, it was 100 pg/mL, which is the cutpoint used for ruling out a diagnosis of HF. For NT-proBNP the target concentrations were 1,000 pg/mL386 and <2,200 pg/mL.4 A concentration of 900 pg/mL has been recommended as the cutpoint to rule out HF in patients 50 to 75 years old, but higher in patients >75 years old in an acute setting (1,800 pg/mL). Two studies defined target concentrations according to age. For the study using BNP these values were <150 pg/mL for patients <75 years old and <300 pg/mL for patients ≥75 years old.389 Similarly, a higher target concentration was required for patients ≥75 years old for NT-proBNP (<800 pg/mL) compared to <75 years old (<400 pg/mL).53 The remaining four studies expressed target values according to individual patient baseline concentrations. These target values included the NT-proBNP concentration at discharge or 2-week followup after admission (whichever was lower and at minimum 850 pg/mL),385 and ≤2-fold discharge for BNP387 or NT-proBNP.5 In the last study,384 uptitration was defined specifically if: (1) BNP <baseline and clinical status was unchanged or better; (2) BNP <10 percent of previous value with mild signs of congestion; or, (3) BNP ±10 percent of previous value treatment based on clinical signs alone.
The treatment protocols were the same between study arms in six studies apart from the additional requirement of aiming to achieve the BNP/NT-proBNP target concentration in the BNP/NT-proBNP group. The treatment protocols were those recommended by the European Society of Cardiology (ESC)391 and American College of Cardiology (ACC),53 or Swedish HF guidelines.5 In another study, treatment was based on clinical assessment alone388 or in combination with a congestion score.387 The congestion score included one point for each of the following criteria: (1) orthopnea; (2) jugular venous pressure >10 cm H2O; (3) weight gain ≥2 pounds from dry weight; (4) the need to increase diuretics during a clinic visit or in the past 48 hours during the index hospitalization; and (5) ≥one peripheral edema. Treatment in one study was specific to the uptitration of a beta-blocker dose to 10 mg/d.384 The three studies with different treatment protocols dependent on study arms included one study that followed a predefined treatment schedule for the BNP/NT-proBNP group compared to ESC guidelines at the discretion of the investigator.389 In another study, no specific guide to treatment was required for the NT-proBNP group other than drug therapy intensification and/or careful reassessment of medical programs, whereas in the usual care group, ACC/AHA guidelines were followed.386 In one of the studies, an HF specialist was involved in the care of patients in the NT-proBNP group compared to primary care physicians in the usual care group.4 In the NT-proBNP group, patients were seen by the HF specialist every two weeks in addition to multidisciplinary care to optimize therapy following a predefined plan. In the usual care group, the primary care physicians followed a management plan but patients had no contact with HF specialists nor did they have a structured followup.
The followup frequency varied among studies. Two studies had monthly followups384,387 and two studies had 3 month followups after the first visit5,386,388 or second visit.385 Two other studies had the first two followups at 3 months and then 6 months after that.4,53 Another study had 2-, 6-, and 10-week followups and then 4, 6, 9, and then 6 months thereafter.389
All data collected on the study patients are summarized in Table 38 and includes sections on BNP/NT-proBNP, endpoints, and medications. The reported parameters were described as no difference, decrease, or increase for the BNP/NT-proBNP group compared to the usual care group. Table 39 shows the primary endpoints in these studies.
The outcomes included clinical visits, hospital events, mortality, days alive, and QOL scores. They were recorded in various ways and this heterogeneity made it unsuitable to perform any meta-analyses. For example, admissions to the hospital included all-cause, HF only, and cardiovascular events. The events were captured as number of days admitted, time to first admission, and number of patients admitted.
The final concentration of BNP/NT-proBNP for all patients was reported in all studies except one.389 Of these studies, two found decreased values of BNP388 or NT-proBNP.386 The percent of patients who achieved the target concentration was reported in five studies.5,386-388,391 One study had 80 percent of patients below the target at the 3-month followup.391 However, the target was only 10 percent below the patients' baseline value. In the other studies, the percent of patients achieving the target value was between 20 percent and 40 percent.
A composite of endpoints was used in six studies,4,5,53,386,388,389 two studies used only one endpoint,385,387 and one study did not define a primary endpoint.384 Patients in the BNP/NT-proBNP group had fewer events compared to the usual care group in three studies.4,386,388 The other studies showed no difference in the primary endpoint between treatment groups (Table 39).
Admissions were considered all-cause unless otherwise specified. All studies except one53 reported on some parameter related to admissions, most reported on cardiovascular admissions, and three of the four studies4,386,388 reported fewer admissions in the BNP/NT-proBNP group compared to the usual care group.
Deaths were reported as all-cause, cardiovascular, or HF. Two studies did not report deaths.53,387 Of the seven studies that did report on deaths, six reported all-cause,4,5,384,385,388,389 four reported a cardiovascular cause,5,386,388,389 and only two studies reported on death related to HF.388,389
Opposite to death data, days alive data were captured in five studies.5,53,385,387,388 Two studies53,388 showed that patients in the BNP/NT-proBNP group had more days of survival outside the hospital compared to the usual care group.
Quality of Life
Three studies include a QOL questionnaire.5,53,384 One study384 using the Kansas City Cardiomyopathy Questionnaire (KCCQ) showed improvement in score in the BNP/NT-proBNP group compared to the usual care group.
Studies also reported on acute coronary syndrome,386 cerebral ischemia,386 significant ventricular arrhythmia,386 a combined endpoint of time to cardiovascular death or cardiovascular hospitalization,5 congestion score,5 and worsening of HF.386,393 Only one parameter, worsening HF (i.e., new, worsening symptoms and signs of HF requiring unplanned intensification of decongestive therapy) was different in the BNP/NT-proBNP group compared to the usual care group. The study showed fewer events in the BNP/NT-proBNP group.386
Medication (type, dosage, and titrations) was recorded in all but one study.388 The information was usually percent of patients taking the medication, but some studies also reported on the dose or percent of patients achieving the target dose or a percentage of the target dose.
Three studies reported no changes in medications5,384,389 and one study did not report final medical use.388 Five studies reported significant change in some medication use between the BNP/NT-proBNP group and the usual care group.4,53,385-387 The direction of change was consistent in all studies reporting on that medication. Of the eight medications (or group of medications), six (AA, ACE-I, ACE-I or ARB, ACE-1 or ARB and beta-blocker, beta-blocker, spironolactone) were increased and two (ARB, diuretic) were decreased.
Risk of Bias
Methodological quality was assessed using the modified Jadad scale with four additional questions (Table 40). The risk for the nine studies4,5,53,384-389 was low. The SOE was assessed using the single outcome of mortality (Table 41). It was an outcome that all nine studies reported, although one study reported this as days only,387 and it was not clear if the study reporting only cardiovascular death included all deaths.386 Therefore, the RR and CI was calculated on seven studies.4,5,53,384,385,388,389 The effect sizes were variable and dispersion of the effect size was low in three studies4,53,385 but high in four,5,384,388,389 resulting in the precision domain being scored as imprecise. The studies were rated as inconsistent; two studies5,388 reported fewer deaths in the BNP/NT-proBNP group compared to the usual care group, whereas five4,53,384,385,389 did not report a difference. Based on these data, the SOE for this outcome was rated as low. This means there is limited confidence that the estimate of the effect is close to the true effect. The studies were heterogeneous in design and further evidence is needed to conclude whether the effect (outcome) is stable.
Key Question 7. What is the biological variation of BNP and NT-proBNP in patients with HF and without HF?
Design Characteristics of Studies
Seven studies37,38,394-398 included data on biological variation for BNP and NT-proBNP (Table 42). Of these, the population consisted of patients with stable HF for five studies,37,38,394-396 one study that also included healthy individuals,397 and one study that had only healthy individuals.398 No study reported on race but six37,38,394-396,398 of the seven studies were done in Europe suggesting individuals were mostly Caucasian. All study designs were prospective cohort studies, except for one which was a retrospective chart review.38 The diagnosis of HF was described in only three studies,394,395,397 but one did not refer to a standard guideline although criteria were appropriate for a clinical diagnosis of HF.397 Patients with HF were primarily selected from HF clinics, but also from a cardiologist's practice,394 and an unknown source.397 Patients were considered as having stable HF by various physical parameters (e.g, weight, blood pressure, heart rate, waist circumference), clinical status (e.g., heart function, NYHA class, AF, edema, palpitations, renal function) medications, and no hospitalization or death in all but one study397 where no description was provided. The criteria used to assess stability varied across studies and also when the assessment of HF stability was made. Two studies394,396 assessed this before study inclusion at 1 month,396 2 months,394 and since last clinic visit.37 Four studies assessed stability during the collection period37,38,395,397 and one study also considered stability 6 months after the study period.37 The severity of HF was assessed by NYHA classification as mostly level II (58 to 79 percent).
Study duration varied in length from as short as 1 day to as long as 2 years. Overall, the number of patients or participants sampled was small (mean=32, range 5 to 78), as were the samples obtained to calculate biological variation (median=4, range 2 to 15). There were more males than females in the studies. The average of participants was over 60 years except in the two studies397,398 that determined biological variation in younger healthy individuals, which is not representative of the same age range as individuals who have HF.
Blood collection parameters and analytical protocols varied among studies and were inconsistently reported. Some studies considered diurnal rhythm of BNP and NT-proBNP and collected samples at specific times.37,394,395,397,398 Two studies required patients to fast overnight.38,398 A few studies also specified rest time before collection,37,395,396 as BNP and NT-proBNP are known to increase after exercise. Two studies sampled blood from an indwelling catheter.395,396 All studies but two38,396 stored aliquots of separated blood in the freezer prior to their analysis. Storage temperature was from −80°C to −20°C. The studies that did not store samples analyzed samples within 10 min,37 or 2 hours after collection.38 Attention was paid to how the samples were analyzed to reduce analytical variation. Samples were analyzed on the same day or in a batch on a different day; however, two studies did not report this information.396,397 Three assay methods were used for BNP and included Biosite Triage,396,397 Bayer Centaur,37 and Abbott (instrument type not specified).394 The Roche instruments were used for all NT-proBNP assays (Elecsys 1010 and 2010), and all studies assayed samples by this method.
Biological Variation Data
Tables 43 and 44 provide the biological variation data for patients with HF and healthy controls, respectively. The mean concentrations of BNP and NT-proBNP for the group of patients or participants were reported for all except one study.397 Five of the six studies with HF measured NT-proBNP and showed a wide range of concentrations. Three of these studies37,38,396 had mean or median NT-proBNP values which were more than double the other two studies.394,395
The analytical coefficient of variation (CVa) values were calculated by repeat analysis of patient or participant samples,37,396-398 a combination of patient samples and quality control material,395 or quality control material alone.394,396 One study did not specify the type of sample used and provided only an estimate of CVa.38 Of those that used patient or participant samples, two used data from all samples. There were differences in when these samples were tested: some performed the analyses in one run while others did analyses at different time points. The CVa values for BNP were lowest for the Bayer Centaur method (1.8%, 4%) and highest for the Biosite Triage (8.6%, 13.7%), reflecting the higher imprecision for point-of-care devices. Similar CVa values were obtained for NT-proBNP (1.4% to 3.0%). The study with the lowest CVa37 also had the highest number of samples for this estimate (n=80). Analytical variance may vary with analyte concentration, but in the study by Bruins et al.394 no relationship between CVa and BNP or NT-proBNP concentration was found.
Total variation (CVt) is the variance of differences between repeat measurements and is the combination of analytical and biological variation. This relationship provides the basis for calculating the biological variation values for within-individual (CVi), where CVi = (CVt2 – CVa2)1/2.
All studies except for two reported this parameter.397,398 CVi were reported for all studies, but between-individual (CVg) was reported in only three studies.37,397,398 Since CVg is also a derived value, calculated by nested analysis of variance (ANOVA) of the repeated measurement data, it is unclear why it was missing in most studies. Absence of CVg does not permit calculation of the index of individuality (IOI), which is a useful parameter to assess the degree of individuality for a biomarker. Review of the CVi values for BNP and NT-proBNP in patients with HF or healthy controls showed lower values (about one-half) for within-hour396 and within-day394 compared with within-week up to 12 weeks. The CVi values in studies of patients with HF for longer than 1 day were very similar and did not differ between BNP and NT-proBNP (mostly around 20%) except for one study.394 This study did not provide information on how patients were assessed for stability at each time point and therefore it is unknown if they were indeed stable. The patients were also recruited from a single cardiologist practice in a population of mostly Afro-Caribbeans. The ethnicity of the patients in the other five studies was not provided but in four it was a European country and one study was done in the United States.
Figure 22 compares the CVa and CVi values for BNP, and Figure 23 compares CVa and CVi values for NT-proBNP in all studies. These figures show that analytical variation values are much lower than intra-individual values, except for BNP at 1 hour and 10 hours where the opposite occurs. Also, the ratios of CVi/CVa are higher for NT-proBNP compared with BNP (Figures 24 and 25). This means CVa constitutes a larger portion of the total variation for BNP measurements compared with NT-proBNP. These differences were independent of the type of BNP method used, which included a point-of-care method with the highest CVa (Biosite Triage) and two automated methods (Abbott and Bayer Centaur). These data also suggest that variation increases over time. When the data were limited to only NT-proBNP from patients with HF, a plateau appeared at 1 week. There were two data points for the 1-week measurement, which were quite different from each other, but this is most likely a function of the higher CVa for the study using the point-of-care method.396 The smaller CVi at shorter time intervals is likely a function of autocorrelation in repeated measures.399
The relative change value (RCV) is a parameter derived from CVa and CVi values, which constitutes a clinically meaningful change in serial results.
The formula is RCV=Z × 21/2 (CVa2 + CVi2)1/2, where Z is typically set at 1.96 for a probability of 0.05 for statistical significance. Four of the six studies that reported RCV used the Z value of 1.96, however, two studies did not report this value.37,398 The largest RCV values were found for healthy individuals for BNP (123% and 139% for two different methods) and NT-proBNP (92%).397 The only other study with RCV values on healthy individuals measured NT-proBNP and found a much lower value (26%).398 The large difference between RCV values for NT-proBNP is due in part to the log transformation of NT-proBNP data in one398 but not the other study.397 Other reasons for a smaller RCV include more participants (16 vs. 8), more samples (5 vs. 2), and overnight fast and early morning collection (lowest concentration is morning). For patients with HF, the RCV values were overall higher for BNP (32% to 113%) compared with NT-proBNP (16% to 55%). This span of values and pattern reflect the CVi values, as the CVa values were similar since the same method of measurement for NT-proBNP was used.
Four studies reported IOI values.37,395,397,398 This value is a ratio of CVi to CVg and the lower the ratio the greater the difference is between individual variances; the higher the ratio, the more similar individual variances are to each other. The implication is on the applicability of the RCV to individuals. The IOI for NT-proBNP in healthy individuals (0.64 and 0.90) was higher than for patients with HF (0.03 and 0.12). Similarly, the IOI for BNP was lower (0.14) for patients with HF than for healthy individuals (1.1 and 1.8; same patients but different methods). This means there is more individuality for BNP and NT-proBNP for patients with HF compared with healthy individuals.
Sources of Variation
Several studies investigated the sources of the variation using linear38 or multivariate regression analysis.37,395,396 In the study by Frankenstein et al.,396 the authors examined known confounders, including NYHA class, sex, age, weight, waist circumference, heart rate, hemoglobin, and ejection fraction, but none was significant. In another study,396 multivariate analysis controlled for age and sex, did not identify any independent predictors of variance at any time interval. Variation was also not explained by mean arterial pressure, eGFR, plasma volume, weight, or heart rate.37
Agency for Healthcare Research and Quality (US), Rockville (MD)
Balion C, Don-Wauchope A, Hill S, et al. Use of Natriuretic Peptide Measurement in the Management of Heart Failure [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2013 Nov. (Comparative Effectiveness Reviews, No. 126.) Results.