• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Chou R, Helfand M, Peterson K, et al. Comparative Effectiveness and Safety of Analgesics for Osteoarthritis [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2006 Sep. (Comparative Effectiveness Reviews, No. 4.)

Cover of Comparative Effectiveness and Safety of Analgesics for Osteoarthritis

Comparative Effectiveness and Safety of Analgesics for Osteoarthritis [Internet].

Show details


Topic Development

The topic for this report was nominated in a public process. The key questions were developed by investigators from the Oregon EPC with input from a Technical Expert Panel (TEP) formed for this project. Contacted via teleconference, the TEP served in an advisory capacity for this report, helping to refine key questions, identify important issues, and define parameters for the review of evidence.

Search Strategy

A comprehensive search of the scientific literature was conducted to identify relevant studies addressing the key questions. Results from previously conducted meta-analyses and systematic reviews on these topics were sought and used where appropriate and updated when necessary. To identify systematic reviews, in addition to MEDLINE, we searched the Cochrane Database of Systematic Reviews and the websites of the Canadian Coordinating Office for Health Technology Assessment (CCOHTA), Bandolier, and the NHA Health Technology Assessment Programme.

To identify articles relevant to each key question, we searched the Cochrane Database of Systematic Reviews (through 3rd Quarter 2005) the Cochrane Central Register of Controlled Trials (through 3rd Quarter 2005) and Ovid ®MEDLINE (1966- July, 2005.) We used relatively broad searches, combining terms for drug names with terms for relevant research designs, limiting to those studies that focused on osteoarthritis and rheumatoid arthritis (see Appendix D for the complete search strategy). Other sources include reference lists of review articles and unpublished materials from the US Food and Drug Administration (FDA). Pharmaceutical manufacturers were invited to submit scientific information packets, including citations if applicable. All 2,665 citations from these sources were imported into an electronic database (EndNote® 9.0) and considered for inclusion.

Study Selection

Systematic reviews and controlled trials pertinent to the key questions were included. We retrieved any blinded or open, parallel or crossover randomized controlled trial that compared one included drug to another, another active comparator, or placebo. We also included cohort and case-control studies with at least 1,000 cases or participants that evaluated serious gastrointestinal and cardiovascular endpoints that were inadequately addressed by randomized controlled trials.

Data Extraction

The following data were extracted from included trials: study design, setting, population characteristics (including sex, age, ethnicity, diagnosis), eligibility and exclusion criteria, interventions (dose and duration), method of outcome ascertainment if available, and results for each outcome, focusing on efficacy and safety. We recorded intention-to-treat results if available.

Quality Assessment

Assessing Research Quality

We assessed the internal validity (quality) of systematic reviews and randomized trials based on the predefined criteria listed in Appendix E. These criteria are based on those developed by the US Preventive Services Task Force and the National Health Service Centre for Reviews and Dissemination (UK).39 We rated the internal validity of each trial based on the methods used for randomization, allocation concealment, and blinding; the similarity of compared groups at baseline; maintenance of comparable groups; adequate reporting of dropouts, attrition, crossover, adherence, and contamination; loss to followup; and the use of intention-to-treat analysis. Trials that had a fatal flaw in one or more categories were rated poor quality; trials that met all criteria were rated good quality; the remainder were rated fair quality. As the “fair quality” category is broad, studies with this rating vary in their strengths and weaknesses: the results of some fair-quality studies are likely to be valid, while others are only probably valid. A “poor quality” trial is not valid—the results are at least as likely to reflect flaws in the study design as the true difference between the compared drugs.

Included systematic reviews were also rated for quality based on pre-defined criteria (see Appendix E) assessing whether they had a clear statement of the questions(s), reported inclusion criteria, used an adequate search strategy, assessed validity, reported adequate detail of included studies, and used appropriate methods to synthesize the evidence. We included systematic reviews and meta-analyses that included unpublished data inaccessible to the public, but because the results of such analyses are not verifiable, we considered this a methodological shortcoming.

For assessing the internal validity of observational studies, we evaluated whether they used nonbiased selection methods; whether rates of loss to follow-up were acceptable; whether predefined outcomes were specified; whether they used appropriate methods for ascertaining exposures, potential confounders, and outcomes; and whether they performed appropriate statistical analyses of potential confounders. Although many tools exist for quality assessment of nonrandomized trials, there is no consensus on optimal quality rating methods.40 We therefore did not use a formal scoring system to rate the quality of the observational studies included in this review, but noted methodological deficiencies in any of the above areas when present.

Assessing Research Applicability

The applicability of trials and other studies was assessed based on whether the publication adequately described the study population, how similar patients were to the target population in whom the intervention will be applied, whether differences in outcomes were clinically (as well as statistically) significant, and whether the treatment received by the control group was reasonably representative of standard practice. We also recorded the funding source and role of the sponsor.

Rating a Body of Evidence

Overall quality ratings for an individual study were based on ratings of the internal and external validity of the trial. A particular randomized trial might receive two different ratings: one for efficacy and another for adverse events. The overall strength of evidence for a particular key question reflects the quality, consistency, and power of the set of studies relevant to the question.

We assessed the overall strength of evidence for a body of literature about a particular key question, by examining the type, number and quality of studies; the strength of association; the consistency of results within and between study designs; and the possibility for publication bias. Consistent results from good-quality studies across a broad range of populations suggest a high degree of certainty that the results of the studies were true (that is, the entire body of evidence would be considered “good-quality.”) For a body of fair-quality studies, however, consistent results may indicate that similar biases are operating in all the studies. Unvalidated assessment techniques or heterogeneous reporting methods for important outcomes may weaken the overall body of evidence for that particular outcome or make it difficult to accurately estimate the true magnitude of benefit or harm.

Data Synthesis

Effectiveness Versus Efficacy

Throughout this report, we highlight effectiveness studies conducted in primary care or office-based settings that use less stringent eligibility criteria, assess health outcomes of most importance to patients, and have longer follow-up periods than most efficacy studies. The results of effectiveness studies are more applicable to the “average” patient than results from highly selected populations in efficacy studies. Examples of “effectiveness” outcomes include quality of life, global measures of successful treatment, and the ability to work or function in social activities. These outcomes are more important to patients, family, and care providers than surrogate or intermediate measures such as scores based on psychometric scales. Further discussion of these issues is available at http://effectivehealthcare.ahrq.gov/reference/purpose.cfm .

Data Presentation

We constructed evidence tables showing study characteristics, quality ratings, and results for all included studies. We also performed two quantitative analyses for this review. An important limitation of observational studies of NSAIDs is that none simultaneously assessed the risk for serious cardiac and GI events. We therefore re-analyzed data from a set of observational studies that reported rates of three different serious adverse events in the same population. We assumed that the adverse events occurred independently and that the logarithm of the rate ratios was distributed normally. After estimating the effect (number of events prevented or caused) for each of the three adverse events, we estimated the net effects on all three serious adverse events using Monte Carlo simulation.

We also pooled clinical success rates and withdrawal due to adverse events from head-to-head trials of topical versus oral NSAIDs using a random effects model (Dersimonian-Laird method, using RevMan® statistical software). We performed standard chi-square tests for heterogeneity. Because only four trials were available for pooling, we did not attempt meta-regression analyses to evaluate potential sources of heterogeneity.

PubReader format: click here to try


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (933K)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...