NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Smith B, Peterson K, Fu R, et al. Drug Class Review: Drugs for Fibromyalgia: Final Original Report [Internet]. Portland (OR): Oregon Health & Science University; 2011 Apr.

Cover of Drug Class Review: Drugs for Fibromyalgia

Drug Class Review: Drugs for Fibromyalgia: Final Original Report [Internet].

Show details


Inclusion Criteria


Included were adult outpatient populations with fibromyalgia or fibromyalgia syndrome as diagnosed by the 1990 or 2010 American College of Rheumatology diagnostic criteria for fibromyalgia.2, 30 Studies of patients with fibromyalgia, fibromyalgia syndrome, or fibrositis based on diagnostic criteria other than those established by American College of Rheumatology (1990 or 2010 versions) were also included, with planned sensitivity analyses to investigate whether variation in diagnostic criteria contributed to differences in outcomes.


Table 1 below lists the interventions that are included in this report. Black box warnings for the included interventions are listed in Appendix C.

Table 1. Included interventions.

Table 1

Included interventions.


  • Direct comparisons of included drugs in head-to-head trials were preferred
  • For indirect comparisons, only placebo-controlled trials were considered.

Effectiveness/Efficacy Outcomes

  • Pain – primary outcome, including tender points, as based on all types of assessments and at all time points
  • Functional capacity (e.g., work productivity, days missed from work, etc.)
  • Health-related quality of life
  • Response (e.g., proportion achieving, speed of, duration of, etc.)
  • Fatigue, depressiveness, sleep, global status.


  • Overall adverse events
  • Withdrawals due to adverse events
  • Specific adverse events (e.g., hepatic, renal, hematologic, dermatologic, sedation/drowsiness, and other neurologic side effects).

Study Designs

  1. For effectiveness, controlled clinical trials and good-quality systematic reviews
  2. For harms, in addition to controlled clinical trials, observational studies were included
    1. Observational studies were defined as comparative cohort and case-control studies with a well defined fibromyalgia population
    2. Noncomparative observational studies were included only if the duration of follow-up was 1 year or longer, and if serious harms were reported. A serious harm is one that results in long-term health effects or mortality.

Literature Search

We searched Ovid MEDLINE® (1947 to September Week 3 2010), the Cochrane Database of Systematic Reviews® (2005 to September 2010), and the Cochrane Central Register of Controlled Trials® (3rd Quarter 2010) and Database of Abstracts of reviews of Effects (3rd Quarter 2010) using included drugs, indications, and study designs as search terms. (See Appendix D for complete search strategies). We attempted to identify additional studies through hand searches of reference lists of included studies and reviews. In addition, we searched the US Food and Drug Administration Center for Drug Evaluation and Research website for medical and statistical reviews of individual drug products. Finally, we requested dossiers of published and unpublished information from the relevant pharmaceutical companies for this review. All received dossiers were screened for studies or data not found through other searches. All citations were imported into an electronic database (Endnote®X2, Thomson Reuters).

Study Selection

Selection of included studies was based on the inclusion criteria created by the Drug Effectiveness Review Project participants, as described above. Titles and abstracts were first assessed by one reviewer for inclusion using the criteria described above and then checked by a second reviewer. Full-text articles of potentially relevant citations were retrieved and again were assessed for inclusion by one reviewer and checked by a second reviewer. Disagreements were resolved by consensus. Results published only in abstract form were not included because inadequate details were available for quality assessment.

Data Abstraction

The following data were abstracted from included trials: eligibility criteria; interventions (dose and duration); population characteristics, including sex, age, ethnicity, and diagnosis; numbers randomized, withdrawn, lost to follow-up and analyzed; and results for each included outcome. We recorded intention-to-treat results when reported. If true intention-to-treat results were not reported, but loss to follow-up was very small, we considered these results to be intention-to-treat results. In cases where only per protocol results were reported, we calculated intention-to-treat results if the data for these calculations were available. Data abstraction was performed by one reviewer and was independently checked by a second reviewer. Differences were resolved by consensus.

Validity Assessment

We assessed the internal validity (quality) of trials based on the predefined criteria (see These criteria are based on the US Preventive Services Task Force and the National Health Service Centre for Reviews and Dissemination (United Kingdom) criteria.31, 32 We rated the internal validity of each trial based on the methods used for randomization, allocation concealment, and blinding; the similarity of compared groups at baseline; maintenance of comparable groups; adequate reporting of dropouts, attrition, crossover, adherence, and contamination; loss to follow-up; and the use of intention-to-treat analysis. Trials that had a fatal flaw were rated poor quality; trials that met allcriteria were rated good quality; the remainder were rated fair quality. As the fair-quality category is broad, studies with this rating vary in their strengths and weaknesses: the results of some fair-quality studies are likely to be valid, while others are only possibly valid. A poor-quality trial is not valid; the results are at least as likely to reflect flaws in the study design as a true difference between the compared drugs. A fatal flaw is reflected by failure to meet combinations of items of the quality assessment checklist. A particular randomized trial might receive 2 different ratings, one for effectiveness and another for adverse events.

The criteria used to rate observational studies of adverse events reflect aspects of the study design that are particularly important for assessing adverse event rates. We rated observational studies as good quality for adverse event assessment if they adequately met 6 or more of the 7 predefined criteria, fair quality if they met 3 to 5 criteria, and poor quality if they met 2 or fewer criteria.

Included systematic reviews were also rated for quality. We rated the internal validity based on a clear statement of the questions(s); reporting of inclusion criteria; methods used for identifying literature (the search strategy), validity assessment, and synthesis of evidence; and details provided about included studies. Again, these studies were categorized as good when all criteria were met.

Quality assessment was performed by one reviewer and independently checked by a second reviewer and differences were resolved by consensus.

Grading the Strength of Evidence

We graded strength of evidence based on the guidance established for the Evidence-based Practice Center Program of the Agency for Healthcare Research and Quality.33 Developed to grade the overall strength of a body of evidence, this approach incorporates 4 key domains: risk of bias (includes study design and aggregate quality), consistency, directness, and precision of the evidence. It also considers other optional domains that may be relevant for some scenarios, such as a dose-response association, plausible confounding that would decrease the observed effect, strength of association (magnitude of effect), and publication bias.

Table 2 describes the grades of evidence that can be assigned. Grades reflect the strength of the body of evidence to answer key questions on the comparative effectiveness, efficacy, and harms of drugs for fibromyalgia. Grades do not refer to the general efficacy or effectiveness of pharmaceuticals. Grading the strength of the evidence was first performed by one reviewer and independently checked by a second reviewer and differences were resolved by consensus.

Table 2. Definitions of the grades of overall strength of evidence.

Table 2

Definitions of the grades of overall strength of evidence.

Among the multitude of outcomes assessed in trials of drugs for fibromyalgia, we focused on rating the strength of evidence for only a subset of 6 that we judged to represent the most clinically important and reliable: pain, fatigue, proportion of patients with a 50% or greater improvement in symptoms, mean change in Fibromyalgia Impact Questionnaire Total Score, overall adverse events, and withdrawals due to adverse events.

Data Synthesis

We constructed evidence tables showing the study characteristics, quality ratings, and results for all included studies. We reviewed studies using a hierarchy of evidence approach, where the best evidence is the focus of our synthesis for each question, population, intervention, and outcome addressed. Studies that evaluated one drug for fibromyalgia against another provided direct evidence of comparative effectiveness and adverse event rates. Where possible, these data were the primary focus. Direct comparisons were preferred over indirect comparisons; similarly, effectiveness and long-term safety outcomes were preferred to efficacy and short-term tolerability outcomes.

In theory, trials that compare an included drug for fibromyalgia with any other nonincluded treatment or with placebos can also provide evidence about effectiveness. This is known as an indirect comparison and can be difficult to interpret for a number of reasons, primarily heterogeneity of trial populations, interventions, and outcomes assessment. Data from indirect comparisons are used to support direct comparisons, where they exist, and are used as the primary comparison where no direct comparisons exist. Indirect comparisons should be interpreted with caution.

Meta-analyses were conducted to summarize data and obtain more precise estimates on outcomes for which studies were homogeneous enough to provide a meaningful combined estimate. In order to determine whether meta -analysis could be meaningfully performed, we considered the quality of the studies and the heterogeneity among studies in design, patient population, interventions, and outcomes. When meta-analysis could not be preformed, the data were summarized qualitatively.

For continuous outcomes, we used the mean difference between treatment and placebo groups as the effect measure, which we estimated based on mean change scores and standard errors from baseline to follow up for each group from each study. Hedge’s g, one of the measures for standardized mean differences, was used if different instruments (scales) were used by different studies for the same outcome. For dichotomous outcomes, relative risk was used as the effect measure. All combined effects were estimated using random-effects models.35 The Q statistic and the I2 statistic (the proportion of variation in study estimates due to heterogeneity) were calculated to assess heterogeneity in effects between studies.36, 37 Due to the small number of studies, it was not feasible to use subgroup analysis and meta-regression to explore heterogeneity. We conducted sensitivity analyses to check the impact of dosage, length of follow-up, and definitions of outcome on the results.

Because head-to-head evidence was sparse, we used the method described by Bucher, et al. to perform indirect comparison meta-analysis to evaluate the difference between drugs based on data from placebo-controlled trials, as the trials were generally comparable in patient population and clinical and methodological characteristics. The magnitude of difference was characterized using relative risk ratio for relative risks and difference of mean difference for mean differences. Negative (−) difference of mean differences were interpreted as suggesting that drug A is associated with a greater reduction in fibromyalgia symptoms than drug B. Relative risk ratios greater than 1.0 were interpreted as suggesting that drug A is associated with a higher relative benefit compared to drug B for efficacy outcomes and higher relative risk for adverse events. All analyses were performed using Stata 11.0 (StataCorp, College Station, TX, 2009).

Peer Review

We requested and received peer review of the report from 4 content experts. Their comments were reviewed and, where possible, incorporated into the final document. All comments and the authors’ proposed actions were reviewed by representatives of the participating organizations of the Drug Effectiveness Review Project before finalization of the report. Names of peer reviewers for the Drug Effectiveness Review Project are listed at

Public Comment

This report was posted to the Drug Effectiveness Review Project website for public comment. We received comments from 3 persons, representing 2 pharmaceutical companies.

Copyright © 2011, Oregon Health & Science University.
Bookshelf ID: NBK55553


Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...