Home > Methods - Noninvasive Positive-Pressure...

PubMed Health. A service of the National Library of Medicine, National Institutes of Health.

Williams JW Jr, Cox CE, Hargett CW, et al. Noninvasive Positive-Pressure Ventilation (NPPV) for Acute Respiratory Failure [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Jul. (Comparative Effectiveness Reviews, No. 68.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.


The methods for this comparative effectiveness review (CER) follow those suggested in the Agency for Healthcare Research and Quality (ARHQ) Methods Guide for Effectiveness and Comparative Effectiveness Reviews (hereafter referred to as the Methods Guide).27 All methods and analyses were guided by a protocol, which was developed as described immediately below.

Topic Refinement and Review Protocol

During the topic refinement stage, we solicited input from a group of Key Informants (KIs) representing medical professional societies/clinicians in the areas of pulmonology, critical/intensive care, and respiratory therapy; scientific experts; payers; and Federal agencies to help define the Key Questions (KQs). These KQs were posted on AHRQ's Web site for public comment for 4 weeks, beginning in late December 2010. The comments received were considered in the revision of the KQs and in the development of the research protocol. We next convened a Technical Expert Panel (TEP) to provide input on defining populations, interventions, comparisons, and outcomes, as well as for identifying particular studies or databases to search. The TEP members provided the same range of viewpoints and expertise as are described for the KI group, with the addition of a methodologist with experience in trial efficacy-effectiveness assessment. The KIs and members of the TEP were required to disclose any financial conflicts of interest greater than $10,000 and any other relevant business or professional conflicts of interest. Any potential conflicts of interest were balanced or mitigated. KIs and members of the TEP did not perform analyses of any kind or contribute to the writing of the report. All methods and analyses were guided by the protocol; certain methods map to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist.28

Literature Search Strategy

Search Strategy

We searched PubMed®, Embase®, and the Cochrane Database of Systematic Reviews to identify relevant published literature. Our search strategies used the National Library of Medicine's medical subject headings (MeSH) keyword nomenclature and text words for noninvasive positive-pressure ventilation (NPPV) and eligible study designs. We used validated search filters for randomized study designs where possible (the Cochrane Highly Sensitive Search Strategy for identifying randomized trials in MEDLINE: sensitivity-maximizing version [2008 revision] in PubMed, and the Cochrane search filter for identifying randomized trials in Embase29). We included studies conducted in adults and published in English from 1990 on. We limited studies to 1990 forward because standards of care have changed significantly since 1990. Search dates and the exact search strings used for each database are given in Appendix A. All searches were designed and conducted in collaboration with an experienced search librarian.

We supplemented the electronic searches with a manual search of citations from a set of key primary and review articles.25,30-49 All citations were imported into an electronic bibliographic database (EndNote® Version X4; Thomson Reuters, Philadelphia, PA).

As a mechanism to ascertain publication bias, we searched ClinicalTrials.gov to identify completed but unpublished studies (see Appendix A for search date and the exact search string used).

We used two approaches to identifying relevant grey literature: 1) a request for scientific information packets submitted to device manufacturers; and 2) a request submitted to the U.S. Food and Drug Administration for any unpublished randomized controlled trial (RCT) data available for devices used to provide noninvasive positive-pressure ventilation.

Inclusion and Exclusion Criteria

The PICOTS (population, interventions, comparators, outcomes, timing, and settings) criteria used to screen articles for inclusion/exclusion at both the title-and-abstract and full-text screening stages are detailed in Table 2. In general, our inclusion criteria were deliberately broad with respect to the definition of acute respiratory failure and etiologies of acute respiratory failure. We limited the setting to hospitals and emergency departments, settings where NPPV is a practical treatment option for acute respiratory failure. NPPV used for weaning was conceptualized broadly to include use for early extubation and to prevent or treat respiratory failure postextubation. We included studies of any duration, but required at least one of our specified final outcomes for inclusion.

Table 2. Inclusion/exclusion criteria.

Table 2

Inclusion/exclusion criteria.

Study Selection

Using the criteria described in Table 2, two investigators independently reviewed each title and abstract for potential relevance to the KQs; articles included by either investigator underwent full-text screening. At the full-text screening stage, two investigators independently reviewed the full text of each article and indicated a decision to “include” or “exclude” the article for data abstraction. When the paired reviewers arrived at different decisions about whether to include or exclude an article, or about the reason for exclusion, we reached a final agreement through review and discussion among investigators. Full-text articles meeting eligibility criteria were included for data abstraction.

For citations retrieved by searching the grey literature or ClinicalTrials.gov, these procedures were modified such that a single screener initially reviewed all citations; final eligibility for data abstraction was determined by duplicate screening review. All screening decisions were made and tracked in a Distiller SR database (Evidence Partners Inc., Manotick, ON, Canada).

Data Extraction

The investigative team created forms for abstracting the data elements for the KQs. Based on their clinical and methodological expertise, a pair of researchers was assigned to abstract data from the eligible articles. One researcher abstracted the data, and the second reviewed the completed abstraction form alongside the original article to check for accuracy and completeness. Disagreements were resolved by consensus or by obtaining a third reviewer's opinion if consensus could not be reached by the first two investigators.

To aid in both reproducibility and standardization of data collection, researchers received data abstraction instructions directly on each form created specifically for this project within the DistillerSR database. The abstraction form templates were pilot-tested with a sample of included articles to ensure that all relevant data elements were captured and that there was consistency and reproducibility across abstractors.

We designed the data abstraction forms for this project to collect the data required to evaluate the specified eligibility criteria for inclusion in this review, as well as demographic and other data needed for determining outcomes (intermediate, final, and adverse events outcomes). We paid particular attention to describing the details of the intervention (e.g., NPPV interface); the training, experience, and disciplines of the treating clinicians; patient characteristics (e.g., etiology of acute respiratory failure); and study design (e.g., efficacy-effectiveness spectrum using a 7-item instrument50) that might affect outcomes.

We used the efficacy-effectiveness instrument (Appendix B) to assess seven domains: population and setting, restrictiveness of eligibility criteria, health outcomes, flexibility of the intervention and study duration, assessment of adverse effects, adequate sample size for important health outcomes, and intention-to-treat approach to analyses. We developed definitions for each domain that were specific to the literature reviewed. We rated each of the seven items as effectiveness (score = 1) or efficacy (score = 0); scores were summed and could range from 0 to 7. Final efficacy-effectiveness scores were based on the mean of two independent ratings, after resolving any scoring disagreements ≥ 2.

We classified the etiology of acute respiratory failure based on study inclusion criteria (e.g., acute respiratory failure secondary to COPD) and the description of included patients. When the etiology was mixed, we classified the study by a single condition if at least 70 percent of the sample had that condition; otherwise, the sample was described as “mixed.”

We prioritized abstraction of clinical outcomes reported for the duration of the ICU or hospital stay, along with any longer term outcomes. Some outcomes were reported only in figures. In these instances, we used the Web-based software, EnGauge Digitizer (digitizer.sourceforge.net/) to convert graphical displays to numerical data. In addition, we described comparators (especially supportive therapy) as carefully as possible given the sometimes limited information provided in the study publications, as treatment standards may have changed during the period covered by this review. The safety outcomes were framed to help identify adverse events, including hospital-acquired pneumonia and facial ulcerations. We also abstracted the data necessary for assessing quality and applicability, as described in the General Methods Guide.27 Appendix C provides a detailed listing of the data elements abstracted.

Quality (Risk of Bias) Assessment of Individual Studies

To assess the risk of bias/methodological quality of individual studies, we used the key criteria for RCTs described in the Methods Guide27 and adapted for this specific topic. These criteria include adequacy of randomization and allocation concealment, comparability of groups at baseline, blinding, completeness of followup and differential loss to followup, whether incomplete data were addressed appropriately, validity of outcome measures, and conflict of interest. These general criteria were customized for each major outcome (see part Appendix C, section VII, for details).

To indicate the summary judgment of the quality of the individual studies, we used the summary ratings of good, fair, and poor, based on the studies' adherence to well-accepted standard methodologies and the adequacy of the reporting (Table 3). For each study, one investigator assigned a summary quality rating, which was then reviewed by a second investigator; disagreements were resolved by consensus or by a third investigator if agreement could not be reached.

Table 3. Definitions of overall quality ratings.

Table 3

Definitions of overall quality ratings.

Data Synthesis

We began by summarizing key features of the included studies for each KQ. To the degree that data were available, we abstracted information on study design; patient characteristics; medical settings; clinician disciplines, experience, or training; type of NPPV, including the interface; and intermediate, final, and adverse events outcomes.

We then determined the feasibility of completing a quantitative synthesis (i.e., meta-analysis). Feasibility depended on the volume of relevant literature, the conceptual homogeneity of the studies, and the completeness of the reporting of results. Based on the frequency of reported outcomes and the relative importance of these outcomes, we determined that quantitative syntheses were indicated for: mortality, intubation or reintubation, myocardial infarction, and hospital-acquired pneumonia; other outcomes were summarized using descriptive statistics. Length of stay was analyzed qualitatively because the data reported for this outcome were often highly skewed, and because this outcome is biased due to the mortality benefit associated with NPPV treatment. For this qualitative synthesis, we focused our analysis on the larger studies that had greater power to detect a clinically and statistically significant difference in length of stay. We did not synthesize physiological outcomes because there were sufficient data to draw conclusions based on final outcomes and more clinically relevant intermediate outcomes. Other clinical outcomes that were reported infrequently, such as rates of sinusitis, facial ulceration and discontinuation due to intolerance, are summarized descriptively.

For the outcomes selected for meta-analysis, we used random-effects models to synthesize the evidence quantitatively using the Comprehensive Meta-Analysis software (Version 2; Biostat, Englewood, NJ). When outcomes were reported at multiple time points, we used the longest in-hospital followup duration (e.g., in-hospital mortality instead of ICU mortality). The majority of outcomes considered in this report were binary or categorical; we therefore summarized these outcomes using a weighted effect measure for proportions (e.g., odds ratio [OR]) and 95 percent confidence intervals (CIs). When we found statistically significant effects, we calculated the risk difference (RD) by using the summary OR and median odds of events from the comparator arms of the included studies (presented in the relevant strength of evidence summary table in the Discussion chapter). If the summary OR varied substantially by study quality, we used the OR from the good-quality studies for this calculation. We tested for heterogeneity using graphical displays and test statistics (Q and I2 statistics), while recognizing that the ability of statistical methods to detect heterogeneity may be limited. When there were sufficient studies (n ≥ 10), we assessed for publication bias using funnel plots and test statistics (Appendix D).51 If these analyses suggested significant publication bias, we computed an adjusted summary estimate using Duval's trim-and-fill method.52

We anticipated that intervention effects may be heterogeneous. We hypothesized that the methodological quality of individual studies, efficacy-effectiveness score, the training or experience of the interventionists, the characteristics of the comparator, and patients' etiology of acute respiratory failure would be associated with the intervention effects. When there were sufficient studies, we performed subgroup analyses to examine these hypotheses. For these efficacy-effectiveness-analyses, we categorized studies as mostly efficacy (score of 0–2), mixed efficacy-effectiveness (score of 3–5), and mostly effectiveness (score of 6–7). Gartlehner et al. reported a sensitivity of 72 percent and specificity of 83 percent using a threshold of 6 or higher to identify effectiveness studies.50 Since European countries have a longer experience with NPPV, and few studies reported the training or experience of clinicians delivering NPPV, we used the global geographical location as a proxy for experience. In addition to these analyses, we summarized qualitatively any relevant subgroup analyses reported in the primary studies.

We conducted a secondary, mixed-treatment meta-analysis to address the effects of CPAP, BPAP (KQ 2), and invasive ventilation compared with supportive therapy by using both direct and indirect comparisons. Mortality is a dichotomous outcome and was fitted using multiple logistic regression analysis. Dummy variables were used for study differences, and treatment variables were used for the three-treatment effects (CPAP, BPAP, and supportive care). A random-effects model was fitted using the EGRET® software (EGRET for Windows, 1999; Cytel Software Corporation, Cambridge, MA) which estimates both fixed-effect and random-effects parameters and automatically generates the dummy variables for each study (“Logistic-Normal Regression Model” option). Hasselblad 199853 describes the application of this methodology to meta-regression problems.

Strength of the Body of Evidence

We graded the strength of evidence for each outcome assessed using the approach described in the Methods Guide.27,54 In brief, this approach requires assessment of four domains: study quality (risk of bias), consistency, directness, and precision. Additional domains considered were strength of association (magnitude of effect) and publication bias. For risk of bias we considered basic (e.g., RCT) and detailed study design (e.g., adequate randomization). For directness, we considered whether the interventions of interest were compared directly (i.e., head-to-head) and the directness of the specific outcome vis-à-vis our Key Questions. For example, we considered ICU length of stay to be an indirect outcome because it does not capture overall resource utilization, including the time and personnel required to implement NPPV. We used results from meta-analyses when evaluating consistency (forest plots, tests for heterogeneity), precision (CIs), strength of association (OR), and publication bias (funnel plots and test statistics). Optimal information size (a method that considers whether the number of events are sufficient to protect against random error) and considerations of whether the CI crossed the clinical decision threshold using a therapy were also used when evaluating precision.55 These domains were considered qualitatively, and a summary rating of “high,” “moderate,” or “low” strength of evidence was assigned after discussion by two reviewers. In some cases, high, moderate, or low ratings were impossible or imprudent to make, for example, when no evidence was available or when evidence on the outcome was too weak, sparse, or inconsistent to permit any conclusion to be drawn. In these situations, a grade of “insufficient” was assigned. This four-level rating scale consists of the following definitions:

  • High—High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect.
  • Moderate—Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate.
  • Low—Low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and is likely to change the estimate.
  • Insufficient—Evidence either is unavailable or does not permit estimation of an effect.


Systematic evidence reviews are conducted to summarize knowledge and to support clinicians, patients, and policymakers in making informed decisions. “Does this information apply?” is the core question for decisionmakers weighing the usefulness and value of a specific intervention or choosing among interventions. Interventions that work well in one context may not in another. The primary aim of assessing applicability is to determine whether the results obtained under research conditions are likely to reflect the results that would be expected in broader populations under “real-world” conditions. In this particular instance, we focused on application to populations cared for in hospital settings.

We assessed applicability directly in KQ 4 (effect of setting, experience, and patient characteristics) and by using the methods described in the Methods Guide.27,56 In brief, the latter methods use the PICOTS (Population, Intervention, Comparator, Outcome, Timing, Setting) format as a way to organize information relevant to applicability. The most important issue with respect to applicability is whether the outcomes are different across studies that recruit different populations (e.g., age groups, exclusions for obesity) or use different methods to implement the intervention (e.g., strict clinical or training protocols). That is, important characteristics are those that affect baseline (control-group) rates of events, intervention-group rates of events, or both. We used a checklist to guide the assessment of applicability (Appendix C, section VIII). We used these data to evaluate the applicability to clinical practice, paying special attention to study eligibility criteria, demographic features of the enrolled population in comparison with the target population, characteristics of the intervention used in comparison with care models currently in use, and clinical relevance and timing of the outcome measures. We summarize issues of applicability qualitatively.

Peer Review and Public Commentary

The peer review process was our principal external quality-monitoring device. Nominations for peer reviewers were solicited from several sources, including the TEP and interested Federal agencies. Experts in pulmonology and critical care, along with individuals representing stakeholder and user communities, were invited to provide external peer review of the draft report; AHRQ and an associate editor also provided comments. The draft report was posted on AHRQ's Web site for public comment for 4 weeks, from January 20, 2012, to February 17, 2012. We have addressed all reviewer comments, revising the text as appropriate, and have documented everything in a disposition of comments report that will be made available 3 months after the Agency posts the final report on AHRQ's Web site. A list of peer reviewers submitting comments on the draft report is provided in the front matter of this report.

Cover of Noninvasive Positive-Pressure Ventilation (NPPV) for Acute Respiratory Failure
Noninvasive Positive-Pressure Ventilation (NPPV) for Acute Respiratory Failure [Internet].
Comparative Effectiveness Reviews, No. 68.
Williams JW Jr, Cox CE, Hargett CW, et al.

AHRQ (US Agency for Healthcare Research and Quality)

PubMed Health Blog...

read all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...