NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Donahue KE, Gartlehner G, Jonas DE, et al. Comparative Effectiveness of Drug Therapy for Rheumatoid Arthritis and Psoriatic Arthritis in Adults [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2007 Nov. (Comparative Effectiveness Reviews, No. 11.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Comparative Effectiveness of Drug Therapy for Rheumatoid Arthritis and Psoriatic Arthritis in Adults

Comparative Effectiveness of Drug Therapy for Rheumatoid Arthritis and Psoriatic Arthritis in Adults [Internet].

Show details


Topic Development

The topic of this report and preliminary key questions arose through a public process involving the public, the Scientific Resource Center (SRC, at for the Effective Health Care program of the Agency for Healthcare Research and Quality (AHRQ) (, and various stakeholder groups ( Investigators from the RTI International-University of North Carolina Evidence-based Practice Center (RTI-UNC EPC) then refined the original questions, in consultation with AHRQ and the SRC through multiple conference calls, into the final set of key questions cited in the introduction.

Literature Search

To identify articles relevant to each key question we searched MEDLINE®, Embase, the Cochrane Library, and the International Pharmaceutical Abstracts. The full search strategy is presented in Appendix B. We used either Medical Subject Headings (MeSH or MH) as search terms when available or key words when appropriate. We combined terms for selected indications (rheumatoid arthritis [RA], psoriatic arthritis [PsA]), drug interactions, and adverse events with a list of nine corticosteroids (betamethasone, budesonide, cortisone, dexamethasone, hydrocortisone, methylprednisolone, prednisone, prednisolone, and triamcinolone), four synthetic disease-modifying antirheumatic drugs (DMARDs, including methotrexate [MTX], leflunomide, sulfasalazine, and hydroxychloroquine), and six biologic DMARDs (abatacept, adalimumab, anakinra, etanercept, infliximab, and rituximab). We limited the electronic searches to “human” and “English language.” Sources were searched from 1980 to September 2006 to capture literature relevant to the scope of our topic.

We used the National Library of Medicine (NLM) publication type tags to identify reviews, randomized controlled trials (RCTs), and meta-analyses. We also manually searched reference lists of pertinent review articles and letters to the editor. We imported all citations into an electronic database (EndNote 8.0). Additionally, we handsearched the Center for Drug Evaluation and Research (CDER) database to identify unpublished research submitted to the U.S. Food and Drug Administration (FDA).

The SRC contacted pharmaceutical manufacturers and invited them to submit dossiers, including citations. We received dossiers from five pharmaceutical companies (Abbott, Amgen, Bristol-Myers Squibb, Centocor, and Genetech).

Our searches found 1,957 citations, unduplicated across databases. Additionally, we identified 166 articles from manually reviewing the reference lists of pertinent review articles. Twenty-eight other studies came from pharmaceutical dossiers, and two additional studies came from peer review or public comments. The total number of citations in our database was 2,153.

Study Selection

We developed eligibility criteria with respect to study design or duration, patient population, interventions, outcomes, and comparisons to medications inside our scope of interest. Table 5 in the introduction describes the criteria in more detail. Because multiple large RCTs had been conducted in this drug class, we adopted a minimum sample size requirement (N ≥ 100) to be able to focus on the best available evidence.

Two persons independently reviewed abstracts. If both reviewers agreed that a study did not meet eligibility criteria, we excluded it. We obtained the full text of all remaining articles and used the same eligibility criteria to determine which, if any, to exclude at this stage. We did not include studies that met eligibility criteria but were reported as an abstract only. These studies are listed in Appendix F.

For this review, results from well-conducted, valid head-to-head trials provide the strongest evidence to compare drugs with respect to efficacy, effectiveness, and harms. We defined head-to-head trials as those comparing one drug of interest with another. RCTs or prospective cohort studies of at least 3 months’ duration and an adult study population with a sample size of at least 100 participants were eligible for inclusion.

For harms (i.e., evidence pertaining to safety, tolerability, and adverse events), we examined data from both experimental and prospective and retrospective observational studies. We included RCTs and observational studies with large sample sizes (≥100 patients), lasting at least 3 months, that reported an outcome of interest.

Initially, we reviewed studies with health outcomes as primary outcome measures. Outcomes for efficacy or effectiveness, for example, were clinical response to treatment, remission, functional capacity, and quality of life. In addition, we included radiographic outcomes as intermediate outcome measures. For harms, we looked for both overall and specific outcomes ranging in severity (e.g., serious infections, malignancies, hepatotoxicity, hematological adverse events, infusion and injection reactions, nausea), withdrawals attributable to adverse events, and drug interactions.

We included meta-analyses in our evidence report if we found them to be relevant for a key question and of good or fair methodological quality.22 We did not abstract individual studies if they had been used in an included meta-analysis; studies in this group that met eligibility criteria are cited in Appendix C. However, we reviewed them to determine whether any other outcomes of interest were reported. Appendix D summarizes reasons for exclusion of studies that were reviewed as full text articles but did not meet eligibility criteria.

Data Extraction

We designed and used a structured data abstraction form to ensure consistency of appraisal for each study. Trained reviewers abstracted data from each study and assigned an initial quality rating. A senior reviewer read each abstracted article, evaluated the completeness of the data abstraction, and confirmed the quality rating.

We abstracted the following data from included articles: study design, eligibility criteria, intervention (drugs, dose, and duration), additional medications allowed, methods of outcome assessment, population characteristics (such as age, sex, race or ethnicity, or mean disease duration), sample size, loss to followup, withdrawals because of adverse events, results, and adverse events reported. We recorded intention-to-treat (ITT) results if available. All data abstraction employed SRS 3.0, TrialStat™ Corporation. Evidence tables containing all abstracted data of included studies are presented in Appendix E.

Quality Assessment

To assess the quality (internal validity) of trials, we used predefined criteria based on those developed by the U.S. Preventive Services Task Force (ratings: good, fair, poor)23 and the National Health Service Centre for Reviews and Dissemination.24 Elements of quality assessment included randomization and allocation concealment, similarity of compared groups at baseline, use of ITT analysis (i.e., all patients were analyzed as randomized with missing values imputed), adequacy of blinding, and overall and differential loss to followup.

In general terms, a “good” study has the least bias and results are considered to be valid. A “fair” study is susceptible to some bias, but probably not sufficient to invalidate its results. The fair-quality category is likely to be broad, so studies with this rating will vary in their strengths and weaknesses. A “poor” rating indicates significant bias (stemming from, e.g., serious errors in design, analysis reporting large amounts of missing information, or discrepancies in reporting) that may invalidate the study’s results.

To assess the quality of observational studies, we used criteria outlined by Deeks et al.25 Items assessed included selection of cases or cohorts and controls, adjustment for confounders, methods of outcomes assessment, length of followup, and statistical analysis.

Two independent reviewers assigned quality ratings. They resolved any disagreements by discussion and consensus or by consulting a third, independent party. Appendix G details the predefined criteria used for evaluating the quality of all included studies.

Studies that met all criteria were rated good quality. The majority of studies received a quality rating of fair. This category includes studies that presumably fulfilled all quality criteria but did not report their methods to an extent that answered all our questions. Time constraints precluded our contacting study authors for clarification of methodological questions. Thus, the fair-quality category includes studies with quite different strengths and weaknesses. Studies that had a fatal flaw (defined as a methodological shortcoming that leads to a very high probability of bias) in one or more categories were rated poor quality and, generally, excluded from our analyses. If no other evidence on an outcome of interest was available, we comment on findings from poor studies. Poor-quality studies and reasons for that rating are presented in Appendix H.

Applicability Assessment

Throughout this report, we highlight effectiveness studies conducted in primary care or office-based settings that use less stringent eligibility criteria, assess health outcomes, and have longer follow-up periods than most efficacy studies.26 We deemed studies that met at least six of seven predefined criteria to be effectiveness studies (Table 6). The results of effectiveness studies are more applicable to the spectrum of patients that will use a drug, have a test, or undergo a procedure than results from highly selected populations in efficacy studies.

Table 6

Table 6

Criteria for effectiveness studies

Rating Strength of a Body of Evidence

We rated the strength of the available evidence in a three-part hierarchy based on an approach devised by the GRADE working group.27 Developed to grade the quality of evidence and the strength of recommendations, this approach incorporates four key elements: study design, study quality, consistency, and directness. It also considers the presence of imprecise or sparse data, high probability of publication bias, evidence of a dose gradient, and magnitude of the effect.

As shown in Table 7, we used three grades: high, moderate, and low (combining the GRADE category of very low with low).28 Grades reflect the strength of the body of evidence to answer key questions on the comparative efficacy, effectiveness, and harms of drugs to treat RA and PsA. The critical element is the extent to which new evidence might alter the confidence we would have in our findings. Grades do not refer to the general efficacy or effectiveness of pharmaceuticals.

Table 7

Table 7

Definitions of the grades of overall strength of evidence

This approach does not incorporate other factors, such as funding sources and comparable dosing, that might be relevant to assess reliably comparative efficacy, effectiveness, and harms. We have assessed these additional factors and highlighted issues that could potentially bias our assessments (e.g., all studies funded by the same manufacturer).

Data Synthesis

Throughout this report we synthesized the literature qualitatively. Comparisons of the drugs that had not yet been quantitatively analyzed in any of the meta-analyses or indirect comparisons that we included either were limited to fewer than three good or fair RCTs or had noncomparable study populations. Therefore, we did not attempt any quantitative analyses of such comparisons.

As is customary for all comparative effectiveness reviews done for AHRQ, the SRC requested review of this report from three outside rheumatology experts in the field. Peer reviewers were charged with commenting on the content, structure, and format of the evidence report, providing additional relevant citations, and pointing out issues related to how we had conceptualized and defined the topic and key questions. Our peer reviewers (listed in Appendix A) gave us permission to acknowledge their review of the draft. We compiled all comments and addressed each one individually, revising the text as appropriate. AHRQ and the SRC also requested review from its own staff. In addition, the SRC placed the draft report on the AHRQ website ( and compiled the comments for our review. Twenty-four public reviewers submitted comments. They represented advocacy groups, the pharmaceutical industry, and practicing physicians. Based on these comments, we revised the text where appropriate.


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.8M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...