This comparative effectiveness review (CER) follows the methods suggested in the Agency for Healthcare Research and Quality (AHRQ) Methods Guide for Effectiveness and Comparative Effectiveness Reviews.29 All methods were determined a priori.

Input From Stakeholders

The key questions for this CER were developed with input from key informants, representing clinicians, wound care researchers, and patient advocates who helped refine key questions, identify important methodological and clinical issues, and define parameters for the review of evidence. The revised key questions were then posted to the AHRQ public Web site for a 4-week public comment period. The AHRQ and our Evidence-based Practice Center (EPC) agreed upon the final key questions after reviewing the public comments, receiving additional input from a Technical Expert Panel (TEP) convened for this report, and revising the key questions. We then drafted a protocol for the CER, which was reviewed by the TEP. The TEP consisted of experts in pressure ulcer treatment and research from geriatrics, primary care, hospital medicine, and nursing disciplines.

Prior to participation in this report, the TEP members disclosed all financial or other conflicts of interest. The AHRQ Task Order Officer and the authors reviewed the disclosures and determined that the panel members had no conflicts of interest that precluded participation.

With input from the TEP, the final protocol was developed prior to initiation of the review, and is available at http://effectivehealthcare.ahrq.gov/ehc/products/309/926/Pressure-Ulcer-Prevention_Protocol_20120110.pdf.

Literature Search Strategy

A research librarian conducted searches on MEDLINE (Ovid) from 1946 to July, 2012; CINAHL (EBSCOhost) from 1988 through July, 2012; and the Cochrane Central Register of Controlled Trials and Cochrane Database of Systematic Reviews using EBM Reviews (Ovid) through July 2012 (see Appendix A for full search strategies). The search strategies were peer reviewed by another information specialist and revised prior to finalization. We also hand-searched the reference lists of relevant studies. In addition, scientific information packets (SIPs) were requested from identified drug and device manufacturers of pressure ulcer treatments, who had the opportunity to submit data using the portal for submitting SIPs on the Effective Health Care Program Web site.

Study Selection

We developed criteria for inclusion and exclusion of studies based on the key questions and the populations, interventions, comparators, outcomes, timing, types of studies, and setting (PICOTS) approach. Inclusion and exclusion criteria, summarized below, are described in more detail by key question in Appendix B. Papers were selected for review if they were about the prevention of pressure ulcers, were relevant to a key question, and met the predefined inclusion criteria. We excluded studies of nonhuman subjects and studies with no original data. Abstracts and full-text articles were reviewed by two investigators for inclusion for each key question. Full-text articles were obtained for all studies that either investigator identified as potentially meeting inclusion criteria. Two investigators independently reviewed all full-text articles for final inclusion. A list of the included studies can be found in Appendix C; excluded studies can be found in Appendix D, with primary reasons for exclusion. We restricted inclusion to English language articles. Titles and abstracts of non-English language articles that may be relevant can be found in Appendix E. Discrepancies were resolved through discussion and consensus, with a third investigator making the final decision if necessary.

Population and Conditions of Interest

The target population was adult patients (>18 years of age) without pressure ulcers at baseline. For studies of risk prediction instruments, we excluded studies that enrolled >10 percent of patients with ulcers at baseline, since the presence of ulcers is in itself a marker of high risk. For studies of preventive interventions, we included studies that reported incident (new) pressure ulcers and in which fewer than 20 percent of subjects had stage 2 or higher ulcers at baseline. We did not restrict inclusion to studies that only enrolled people at higher risk for ulcers, though most studies focused on higher risk people. We evaluated patient subgroups defined by age, race or skin tone, physical impairment, body weight, or specific medical comorbidities (e.g., urinary incontinence, diabetes and peripheral vascular disease). We excluded studies of children and adolescents.

Interventions and Comparisons

For Key Question 1, we included studies that compared effects of using a risk assessment instrument, primarily the Braden Scale, Norton Scale, or Waterlow Scale, with clinical judgment or another risk assessment instrument. We excluded studies that evaluated individual risk factors outside of a risk assessment instrument. For Key Question 2, we included studies that reported the diagnostic accuracy of validated risk assessment tools for predicting incident pressure ulcers. For Key Questions 3 and 4, we included studies that compared interventions to prevent pressure ulcers with usual care, or no treatment, or that compared one preventive intervention with another.

Outcomes

For Key Questions 1 and 3, included outcomes were pressure ulcer incidence and severity, as well as resource utilization (such as duration of hospital stay or cost). For Key Question 2, we included outcomes related to the predictive validity of the risk assessment tools, including diagnostic accuracy (sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, negative likelihood ratio), measures of risk (hazard ratios, odds ratios, and relative risks), and discrimination (area under the receiver operating characteristic [AUROC] curve). For Key Question 4, we included harms (such as dermatologic reactions, discomfort, and infection).

Timing

We did not restrict inclusion of studies based on duration of followup.

Types of Studies

For Key Questions 1 and 4, we included controlled clinical trials and cohort studies. For Key Question 3, we included controlled clinical trials. We amended our protocol to exclude observational studies for Key Question 3 because over 50 clinical trials were available. For Key Question 2 we included prospective studies that reported diagnostic accuracy of risk prediction instruments. No systematic review met inclusion criteria (because they did not directly address a Key Question, were otherwise outside scope, or were not rated high-quality), though we reviewed reference lists of systematic reviews for potentially relevant citations. We also excluded studies published only as conference abstracts.

Setting

We did not exclude studies based on setting. Settings of interest included acute care hospitals, long-term care facilities, rehabilitation facilities, operative and postoperative settings, and non-health care settings (e.g., home care and wheelchair users in the community).

Data Extraction

We extracted the following information from included trials into evidence tables: study design, setting, inclusion and exclusion criteria, population characteristics (including sex, age, ethnicity, prevalent ulcers, risk for ulcers), sample size, duration of followup, attrition, intervention characteristics, method for assessing ulcers, and results. Data extraction for each study was performed by two investigators: the first investigator extracted the data, and the second investigator independently reviewed the extracted data for accuracy and completeness.

For studies of diagnostic accuracy, we attempted to create two-by-two tables from information provided (sample size, prevalence, sensitivity, and specificity) and compared calculated measures of diagnostic accuracy based on the two-by-two tables with reported results. We noted discrepancies between calculated and reported results when present. When reported, we also extracted relative measures of risk (relative risk [RR], odds ratio [OR], hazards ratio [HR]) and the AUROC. The AUROC, which is based on sensitivities and specificities across a range of test results, is a measure of discrimination, or the ability of a test to distinguish people with a condition from people without the condition.30,31 An AUROC of 1.0 indicates perfect discrimination, and an AUROC of 0.5 indicates complete lack of discrimination. Interpretation of AUROC values between 0.5 and 1.0 is somewhat arbitrary, but a value of 0.90 to 1.0 has been classified as excellent, 0.80 to <0.90 as good, 0.70 to <0.80 as fair, and <0.70 as poor.

For studies of interventions, we calculated relative risks and associated 95 percent confidence intervals for pressure ulcers based on the information provided (sample sizes and incidence in each intervention group). We noted discrepancies between calculated and reported results when present.

Assessing Quality

We assessed the quality of each study based on predefined criteria (Appendix F). We adapted criteria from methods proposed by Downs and Black (observational studies),32 the United States Preventive Services Task Force (USPSTF),33 and the Quality Assessment of Diagnostic Accuracy Studies-2 Group.34 The criteria used are consistent with the approach recommended by AHRQ in the Methods Guide for Comparative Effectiveness Reviews.29 We used the term “quality” rather than the alternate term “risk of bias;” both refer to internal validity. Two investigators independently assessed the quality of each study. Discrepancies were resolved through discussion and consensus, with a third investigator making the final decision if necessary.

We rated the quality of each randomized trial based on the methods used for randomization, allocation concealment, and blinding; the similarity of compared groups at baseline; maintenance of comparable groups; adequate reporting of dropouts, attrition, crossover, adherence, and contamination; loss to followup; the use of intent-to-treat analysis; and ascertainment of outcomes.33 For cluster randomized trials, we also evaluated whether the study evaluated cluster effects.35

We rated the quality of each cohort study based on whether it used nonbiased selection methods to create an inception cohort; whether it evaluated comparable groups; whether rates of loss to followup were reported and acceptable; whether it used accurate methods for ascertaining exposures, potential confounders, and outcomes; and whether it performed appropriate statistical analyses of potential confounders.33

We rated the quality of each study evaluating the diagnostic accuracy or predictive value of risk prediction instruments based on whether it evaluated a representative spectrum of patients, whether it enrolled a random or consecutive sample of patients meeting predefined criteria, whether it used a credible reference standard, whether the same reference standard was applied to all patients, whether the reference standard was interpreted independently from the test under evaluation, and whether thresholds were predefined.33,34 In addition, unblinded use of a risk prediction instrument (as was typical in the studies) could result in differential use of preventive interventions depending on assessed risk, alter the likelihood of the predicted outcome, and compromise measures of diagnostic accuracy (e.g., if more intense and effective interventions are used in higher-risk patients). Therefore, we also assessed whether studies on diagnostic accuracy reported use of subsequent interventions, and whether risk estimates (when reported) were adjusted for potential confounders.

Following assessment of individual quality criteria, individual studies were rated as “good,” “fair,” or “poor” quality, as defined below.29

Good-quality studies are considered likely to be valid. Good-quality studies clearly describe the population, setting, interventions, and comparison groups; use a valid method for allocation of patients to interventions; clearly report dropouts and have low dropout rates; use appropriate methods for preventing bias; assess outcomes blinded to intervention status; and appropriately measure outcomes and fully report results.

Fair-quality studies have some methodological deficiencies, but no flaw or combination of flaws judged likely to cause major bias. The study may be missing information, making it difficult to assess its methods or assess limitations and potential problems. The fair-quality category is broad, and studies with this rating vary in their strengths and weaknesses: the results of some fair-quality studies are likely to be valid, while others are only probably valid.

Poor-quality studies have significant flaws that may invalidate the results. They have a serious or “fatal” flaw in design, analysis, or reporting; large amounts of missing information; or discrepancies in reporting. The results of these studies are judged to be at least as likely to reflect flaws in the study design as true effects of the interventions under investigation. We did not exclude studies rated poor-quality a priori, but they were considered to be the least reliable studies when synthesizing the evidence, particularly when discrepancies between studies were present. For detailed quality assessment methods see Appendix F.

Assessing Research Applicability

Applicability is defined as the extent to which the effects observed in published studies are likely to reflect the expected results when a specific intervention is applied to the population of interest under “real-world” conditions.36 It is an indicator of the extent to which research included in a review might be useful for informing clinical and/or policy decisions in specific situations. Applicability depends on the particular question and the needs of the user of the review. There is no generally accepted universal rating system for applicability. In addition, applicability depends in part on context. Therefore, we did not assign a rating of applicability (such as “high” or “low”) because applicability may differ based on the user of this report. Rather, we recorded factors important for understanding the applicability of studies, such as whether the publication adequately described the study population, how similar patients were to populations likely to be targeted by screening, whether differences in outcomes were clinically (as well as statistically) significant, and whether the interventions and tests evaluated were reasonably representative of standard practice.37 We also recorded the funding source and role of the sponsor.

We specifically assessed applicability as related to subpopulations directly addressed by the key questions.

Evidence Synthesis and Rating the Body of Evidence

We did not attempt to pool studies on preventive interventions due to methodological limitations in the studies and substantial clinical diversity with respect to the populations, settings, comparisons, and outcomes evaluated (i.e., how pressure ulcers were assessed and graded). We also did not quantitatively pool results on diagnostic accuracy (such as creating summary receiver operating characteristic curves) due to differences across those studies in populations evaluated, differences in how pressure ulcers were assessed and graded, and methodological limitations in the studies. Instead, we created descriptive statistics with the median sensitivity and specificity at specific cutoffs and reported AUROCs, along with associated ranges, and calculated positive and negative likelihood ratios based on the median sensitivities and specificities. Although studies varied in what cutoffs were evaluated, and some evaluated a range of cutoffs without a prespecified threshold, we focused on cutoffs for the most common risk instruments (Braden, Norton, and Waterlow) based on recommended thresholds, which may vary depending on the setting and timing of assessments: ≤15 to 18 for the Braden scale,14,22,3840 ≤12 to 16 for the Norton scale,23,41,42 and ≥10 to 15 for the Waterlow scale.23,43 On the less commonly used Cubbin and Jackson scale, a score of ≤29 has been used to identify people at increased risk.25 The total range across studies for the various measures of diagnostic accuracy, rather than the interquartile range, was reported because the summary range highlighted the greater variability and uncertainty in the estimates.

We assessed the overall strength of evidence for each body of evidence in accordance with the AHRQ Methods Guide for Comparative Effectiveness Reviews.44 We synthesized the quality of the studies; the consistency of results within and between study designs; the directness of the evidence linking the intervention and health outcomes; and the precision of the estimate of effect (based on the number and size of studies and confidence intervals for the estimates). We were not able to formally assess for publication bias in studies of interventions due to small number of studies, methodological shortcomings, or differences across studies in designs, measured outcomes, and other factors. We rated the strength of evidence for each key question using the four categories recommended in the AHRQ Methods Guide:44 A “high” grade indicates high confidence that the evidence reflects the true effect and that further research is very unlikely to change our confidence in the estimate of effect. A “moderate” grade indicates moderate confidence that the evidence reflects the true effect and further research may change our confidence in the estimate of effect and may change the estimate. A “low” grade indicates low confidence that the evidence reflects the true effect and further research is likely to change the confidence in the estimate of effect and is likely to change the estimate. An “insufficient” grade indicates evidence either is unavailable or does not permit a conclusion. See Appendix G for the strength of evidence tables.

Peer Review and Public Commentary

Experts in prevention and management of pressure ulcers, geriatric medicine, wound care research, and epidemiology, as well as individuals representing important stakeholder groups, were invited to provide external peer review of this CER. The AHRQ Task Order Officer and a designated EPC Associate Editor also provided comments and editorial review. To obtain public comment, the draft report was posted on the AHRQ Web site for 4 weeks. A disposition of comments report detailing the authors’ responses to the peer and public review comments will be made available 3 months after the AHRQ posts the final CER on the public Web site.