PubMed Health. A service of the National Library of Medicine, National Institutes of Health.

Peterson K, McDonagh M, Thakurta S, et al. Drug Class Review: Nonsteroidal Antiinflammatory Drugs (NSAIDs): Final Update 4 Report [Internet]. Portland (OR): Oregon Health & Science University; 2010 Nov.


Inclusion Criteria



Effectiveness outcomes

  • Pain
  • Functional status
  • Discontinuations due to lack of effectiveness.


Study Designs

Literature Search

We searched Ovid MEDLINE® (1996 to June week 2, 2010), the Cochrane Database of Systematic Reviews® (2005 to May 2010), the Cochrane Central Register of Controlled Trials® (2nd Quarter 2010), and Database of Abstracts of Reviews of Effects (2nd Quarter 2010) using included drugs, indications, and study designs as search terms. (See Appendix D for complete search strategies). We attempted to identify additional studies through hand searches of reference lists of included studies and reviews. In addition, we searched the US Food and Drug Administration Center for Drug Evaluation and Research website for medical and statistical reviews of individual drug products. Finally, we requested dossiers of published and unpublished information from the relevant pharmaceutical companies for this review. All received dossiers were screened for studies or data not found through other searches. All citations were imported into an electronic database (Endnote® XI, Thomson Reuters). Other databases and websites, including Embase, Canadian Agency for Drugs and Technology in Health, and Bandolier, were searched during the production of original report and previous updates.

Study Selection

Selection of included studies was based on the inclusion criteria created by the Drug Effectiveness Review Project participants, as described above. Two reviewers independently assessed titles and abstracts of citations identified through literature searches for inclusion using the criteria above. Full-text articles of potentially relevant citations were retrieved and again were assessed for inclusion by both reviewers. Disagreements were resolved by consensus. Results published only in abstract form were not included because inadequate details were available for quality assessment. Inclusion of randomized controlled trials were limited to only those of at least 4 weeks’ duration that compared celecoxib to an NSAID or 2 or more NSAIDs to one another.

Data Abstraction

The following data were abstracted from included trials: study design; setting; population characteristics, including sex, age, ethnicity, and diagnosis; population; interventions (dose and duration); comparisons; number randomized, number withdrawn, and lost to follow-up; and results for each outcome. We recorded intention-to-treat results when reported. If true intention-to-treat results were not reported, but loss to follow-up was very small, we considered these results to be intention-to-treat results. In cases where only per protocol results were reported, we calculated intention-to-treat results if the data for these calculations were available.

Validity Assessment

We assessed the internal validity (quality) of trials based on the predefined criteria (see These criteria are based on those developed by the US Preventive Services Task Force and the National Health Service Centre for Reviews and Dissemination (United Kingdom).12, 13 We rated the internal validity of each trial based on the methods used for randomization, allocation concealment, and blinding; the similarity of compared groups at baseline; maintenance of comparable groups; adequate reporting of dropouts, attrition, crossover, adherence, and contamination; loss to follow-up; and the use of intention-to-treat analysis. Trials that had a fatal flaw in 1 or more categories were rated poor quality; trials which met all criteria were rated good quality; the remainder were rated fair quality. As the “fair quality” category is broad, studies with this rating vary in their strengths and weaknesses: the results of some fair-quality studies are likely to be valid, while others are only probably valid. A “poor quality” trial is not valid—the results are at least as likely to reflect flaws in the study design as the true difference between the compared drugs. A fatal flaw is reflected by failure to meet combinations of items of the quality assessment checklist. A particular randomized trial might receive 2 different ratings: one for effectiveness and another for adverse events.

The criteria used to rate observational studies of adverse events reflect aspects of the study design that are particularly important for assessing adverse event rates. We rated observational studies as good quality for adverse event assessment if they adequately met 6 or more of the 7 predefined criteria, fair quality if they met 3 to 5 criteria, and poor quality if they met 2 or fewer criteria.

Included systematic reviews were also rated for quality. We rated the internal validity based a clear statement of the questions(s); reporting of inclusion criteria; methods used for identifying literature (the search strategy), validity assessment, and synthesis of evidence; and details provided about included studies. Again, these studies were categorized as good when all criteria were met.

Grading the Strength of Evidence

We graded strength of evidence based on the guidance established for the Evidence-based Practice Center Program of the Agency for Healthcare Research and Quality.14 Developed to grade the overall strength of a body of evidence, this approach incorporates 4 key domains: risk of bias (includes study design and aggregate quality), consistency, directness, and precision of the evidence. It also considers other optional domains that may be relevant for some scenarios, such as a dose-response association, plausible confounding that would decrease the observed effect, strength of association (magnitude of effect), and publication bias.

Table 3 describes the grades of evidence that can be assigned. Grades reflect the strength of the body of evidence to answer key questions on the comparative effectiveness, efficacy, and harms of NSAIDs. Grades do not refer to the general efficacy or effectiveness of pharmaceuticals.

Table 3. Definitions of the Grades of Overall Strength of Evidence.

Table 3

Definitions of the Grades of Overall Strength of Evidence.

Data Synthesis

We constructed evidence tables showing the study characteristics, quality ratings, and results for all included studies. We reviewed studies using a hierarchy of evidence approach, where the best evidence is the focus of our synthesis for each question, population, intervention, and outcome addressed. Studies that evaluated one NSAID against another provided direct evidence of comparative effectiveness and adverse event rates. Where possible, these data are the primary focus. Direct comparisons were preferred over indirect comparisons; similarly, effectiveness and long-term safety outcomes were preferred to efficacy and short-term tolerability outcomes.

In theory, trials that compare NSAIDs with other drug classes or with placebos can also provide evidence about effectiveness. This is known as an indirect comparison and can be difficult to interpret for a number of reasons, primarily heterogeneity of trial populations, interventions, and outcomes assessment. Data from indirect comparisons are used to support direct comparisons, where they exist, and are used as the primary comparison where no direct comparisons exist. Indirect comparisons should be interpreted with caution.

Quantitative analyses were conducted using meta-analyses of outcomes reported by a sufficient number of studies that were homogeneous enough that combining their results could be justified. These analyses were created using Stats Direct (Cam Code, Altrincham UK) software. In order to determine whether meta-analysis could be meaningfully performed, we considered the quality of the studies and the heterogeneity among studies in design, patient population, interventions, and outcomes. When meta-analysis could not be performed, the data were summarized qualitatively.

Random-effects models were used to estimate pooled effects.16 If necessary, indirect meta-analyses were done to compare interventions for which there were no head-to-head comparisons and where there was a common comparator intervention across studies.17 Forest plots graphically summarize results of individual studies and of the pooled analysis.18

The Q statistic and the I2 statistic (the proportion of variation in study estimates due to heterogeneity) were calculated to assess heterogeneity in effects between studies.19, 20 Potential sources of heterogeneity were examined by analysis of subgroups of study design, study quality, patient population, and variation in interventions. Meta-regression models were used to formally test for differences between subgroups with respect to outcomes.16, 21

Public Comment

This report was posted to the Drug Effectiveness Review Project website for public comment. We received comments from two pharmaceutical companies.

Copyright © 2010, Oregon Health & Science University.
Cover of Drug Class Review: Nonsteroidal Antiinflammatory Drugs (NSAIDs)
Drug Class Review: Nonsteroidal Antiinflammatory Drugs (NSAIDs): Final Update 4 Report [Internet].
Peterson K, McDonagh M, Thakurta S, et al.
Portland (OR): Oregon Health & Science University; 2010 Nov.

PubMed Health Blog...

read all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...