The Agency for Healthcare Research and Quality (AHRQ), through its Evidence-Based Practice Centers (EPCs), sponsors the development of evidence reports and technology assessments to assist public- and private-sector organizations in their efforts to improve the quality of health care in the United States. The reports and assessments provide organizations with comprehensive, science-based information on common, costly medical conditions and new health care technologies. The EPCs systematically review the relevant scientific literature on topics assigned to them by AHRQ and conduct additional analyses when appropriate prior to developing their reports and assessments.
To bring the broadest range of experts into the development of evidence reports and health technology assessments, AHRQ encourages the EPCs to form partnerships and enter into collaborations with other medical and research organizations. The EPCs work with these partner organizations to ensure that the evidence reports and technology assessments they produce will become building blocks for health care quality improvement projects throughout the Nation. The reports undergo peer review prior to their release.
AHRQ expects that the EPC evidence reports and technology assessments will inform individual health plans, providers, and purchasers as well as the health care system as a whole by providing important information to help improve health care quality.
We welcome comments on this evidence report. They may be sent by mail to the Task Order Officer named below at: Agency for Healthcare Research and Quality, 540 Gaither Road, Rockville, MD 20850, or by e-mail to epc@ahrq.gov.
Carolyn M. Clancy, M.D.
Director
Agency for Healthcare Research and Quality
Beth A. Collins Sharp, Ph.D., R.N.
Director, EPC Program
Agency for Healthcare Research and Quality
Jean Slutsky, P.A., M.S.P.H.
Director, Center for Outcomes and Evidence
Agency for Healthcare Research and Quality
Capt. Ernestine Murray, R.N., M.A.S.
EPC Program Task Order Officer
Agency for Healthcare Research and Quality
The research team would like to acknowledge the efforts of Maxine A. Gere, M.S., for general editorial assistance and program support; Carol Gold-Boyd for administrative support; and Tina Murray, R.N., M.A.S., Senior Health Policy Analyst, Agency for Healthcare Research and Quality Center for Outcomes and Evidence, for advice as our Task Order Officer.
Objectives: Systematic review of outcomes of three treatments for osteoarthritis (OA) of the knee: intra-articular viscosupplementation; oral glucosamine, chondroitin or the combination; and arthroscopic lavage or debridement.
Data Sources: We abstracted data from: 42 randomized, controlled trials (RCTs) of viscosupplementation, all but one synthesized among six meta-analyses; 21 RCTs of glucosamine/chondroitin, 16 synthesized among 6 meta-analyses; and 23 articles on arthroscopy. The search included foreign-language studies and relevant conference proceedings.
Review Methods: The review methods were defined prospectively in a written protocol. We sought systematic reviews, meta-analyses, and RCTs published in full or in abstract. Where randomized trials were few, we sought other study designs. We independently assessed the quality of all primary studies.
Results: Viscosupplementation trials generally report positive effects on pain and function scores compared to placebo, but the evidence on clinical benefit is uncertain, due to variable trial quality, potential publication bias, and unclear clinical significance of the changes reported.
The Glucosamine/Chondroitin Arthritis Intervention Trial (GAIT), a large (n=1,583), high-quality, National Institutes of Health-funded, multicenter RCT showed no significant difference compared to placebo. Glucosamine sulfate has been reported to be more effective than glucosamine hydrochloride, which was used in GAIT, but the evidence is not sufficient to draw conclusions. Clinical studies of glucosamine effect on glucose metabolism are short term, or if longer (e.g., 3 years), excluded patients with metabolic disorders.
The best available evidence for arthroscopy, a single sham-controlled RCT (n=180), showed that arthroscopic lavage with or without debridement was equivalent to placebo. The main limitations of this trial are the use of a single surgeon and enrollment of patients at a single Veterans Affairs Medical Center.
No studies reported separately on patients with secondary OA of the knee. The only comparative study was an underpowered, poor-quality trial comparing viscosupplementation to arthroscopy with debridement.
Conclusions: Osteoarthritis of the knee is a common condition. The three interventions reviewed in this report are widely used in the treatment of OA of the knee, yet the best available evidence does not clearly demonstrate clinical benefit. Uncertainty regarding clinical benefit can be resolved only by rigorous, multicenter RCTs. In addition, given the public health impact of OA of the knee, research on new approaches to prevention and treatment should be given high priority.
Osteoarthritis (OA) affects about 21 million people in the United States. By age 65, the majority of the population has radiographic evidence of osteoarthritis and 11 percent have symptomatic OA of the knee. This is a systematic review of three treatments for OA of the knee: intra-articular injections of viscosupplements; oral glucosamine, chondroitin, or the combination; and, arthroscopic lavage and debridement. The key questions are: (1) effectiveness and harms in primary OA of the knee, (2) in secondary OA of the knee, (3) in subpopulations, and (4) comparison of the three interventions.
The review methods were defined prospectively in a written protocol. A technical expert panel provided consultation. The draft report was also reviewed by other experts and stakeholders.
We sought systematic reviews, meta-analyses, and RCTs published in full or in abstract that reported on one or more of the interventions among patients with primary or secondary osteoarthritis of the knee; and reported at least one outcome of interest. Primary outcomes were pain, function, quality of life and adverse effects.
Our search had no language restrictions and used these electronic databases:
MEDLINE® (through March 29, 2007)
EMBASE (through March 16, 2006)
Cochrane Controlled Trials Register (through November 27, 2006).
EMBASE was updated with abbreviated searches through November 27, 2006. Additional sources were 2004-2006 conference proceedings of the American Association of Orthopedic Surgeons (AAOS), American College of Rheumatology (ACR) and the Osteoarthritis Research Society International (OARSI). Product inserts of U.S.-marketed viscosupplements were consulted.
There were few RCTs on arthroscopy or comparative outcomes, so we also sought nonrandomized comparative trials and, for arthroscopy, administrative database analyses and case series (n>50). Because several comprehensive systematic reviews with meta-analyses on viscosupplementation and glucosamine/chondroitin had been published, we focused on detailed review of existing meta-analyses, supplemented by primary studies where necessary.
Of 1,842 citations, 451 articles were retrieved and 98 selected for inclusion:
Six meta-analyses (N=41 trials) and one additional trial of viscosupplementation
Six meta-analyses (N=16 trials) and five additional trials of glucosamine/chondroitin
23 articles on arthroscopy.
A single reviewer screened citations for article retrieval; citations judged as uncertain were reviewed by a second reviewer. The same procedure was used to select articles for inclusion in the review. A single reviewer performed data abstraction and a second reviewed the evidence tables for accuracy. However, study quality was appraised by dual independent review. All disagreements were resolved by consensus.
The quality of RCTs and quasi-experimental studies were assessed using the general approach developed by the U.S. Preventive Services Task Force (Harris, Helfand, Woolf, et al. 2001). Assessment of the quality of systematic reviews and meta-analyses were guided by a quality rating method reported by Oxman and Guyatt (1991). The framework proposed by Carey and Boden (2003) was used to assess the quality of case series.
Effectiveness and Harms in Primary OA of the Knee. Results from 42 trials (N=5,843), all but one synthesized in various combinations in six meta-analyses, generally show positive effects of viscosupplementation on pain and function scores compared to placebo. However, the evidence on viscosupplementation is accompanied by considerable uncertainty due to variable trial quality, potential publication bias, and unclear clinical significance of the changes reported.
The pooled effects from poor quality trials were as much as twice those obtained from higher quality ones. Pooled results from small trials (≤100 patients) showed effects up to twice those of larger trials, a finding consistent with selective publication of underpowered positive trials. Among trials of viscosupplementation, those that have not been published in full text comprise approximately 25 percent of the total patient population.
Most RCTs reported results as mean changes in pain and function. Interpreting the clinical significance of pooled mean effects from the meta-analyses is difficult; mean changes do not quantify proportions responding. Numbers needed to treat cannot be calculated from mean changes. It would be more informative to report response rate, i.e., comparison of the proportion of patients achieving a clinically important improvement.
Trials of hylan G-F 20, the highest molecular weight cross-linked product, generally reported larger effects than other trials.
Minor adverse events accompanying intra-articular injections are common, but the relative risk accompanying hyaluronan injections over placebo appears to be small. Pseudoseptic reactions associated with hyaluronans appear relatively uncommon, but can be severe.
Differences in Outcomes Among Subpopulations. Four RCTs were identified examining any of the specified subgroups. None examined race/ethnicity, disease duration, or prior treatment. In one trial, randomization was stratified by disease severity; all other subgroup results were obtained in post-hoc analyses. There was no evidence for differential effects according to subgroups defined by age, sex, primary/disease, body mass index/weight, or disease severity. One positive post-hoc subgroup analysis found greater efficacy among older individuals with more severe disease, but was not confirmed in a subsequent trial.
Effectiveness and Harms in Primary OA of the Knee. The best evidence comes from the Glucosamine/Chondroitin Arthritis Intervention Trial (GAIT; Clegg, Reda, Harris, et al., 2006), a large (n=1,583), good quality, NIH-funded, multicenter RCT. GAIT compared glucosamine hydrochloride, chondroitin sulfate, or the combination of these agents, with placebo or celecoxib in patients with primary osteoarthritis of the knee. After 24 weeks of treatment, intention-to-treat analysis showed no significant difference in symptomatic relief between glucosamine hydrochloride, chondroitin sulfate, or glucosamine hydrochloride plus chondroitin sulfate compared to placebo. Substantiating this result was that celecoxib, the active control, was effective.
Six study-level meta-analyses (MAs) assessed glucosamine or chondroitin in OA of the knee. All but one of the MAs reported statistically significant differences between treatment and placebo. However, these MAs had limitations in the quality of the primary studies that were pooled. Limitations of the primary literature included small study size, inclusion of studies that assessed joints other than knee, and failure to report intent to treat analysis. In general, the MAs did not perform adequate quality appraisal of the primary studies.
Glucosamine sulfate has been reported to be more effective than glucosamine hydrochloride, however, the evidence is not sufficient to draw conclusions. A subgroup analysis in the largest MA (Towheed, Maxwell, Anastassiades, et al., 2006) significantly favored glucosamine sulfate. The results of GUIDE (Herrero-Beaumont, Roman, Trabado, et al., 2007), a European placebo-controlled RCT (n=318), sponsored by Rotta, a glucosamine sulfate manufacturer, report favorable results for glucosamine sulfate. While the overall results of GAIT show no benefit, in the subgroup of knee OA patients with moderate-to-severe pain at baseline, the combination of glucosamine hydrochloride and chondroitin sulfate significantly improved pain. Together, this evidence suggests an independent trial of glucosamine sulfate would be useful to definitively establish whether there is benefit.
In general, adverse events with glucosamine or chondroitin treatment were no greater than placebo. There has been some concern from in vitro and preclinical studies that glucosamine supplementation could have a deleterious effect on glucose metabolism and glycemic control. However, available clinical studies are short-term, or if longer (e.g., 3 years), excluded patients with metabolic disorders.
Differences in Outcomes Among Subpopulations. GAIT found that glucosamine plus chondroitin produced a statistically and clinically significant improvement of pain in patients with moderate-to-severe OA of the knee. Although the effect of celecoxib treatment in a similar group of patients was not statistically significant, the magnitude and direction of the response were consistent with clinical benefit. The nonsignificant statistical result in the celecoxib arm may be a function of insufficient power due to the small number of patients. Although this subgroup analysis was not explicitly prespecified in the GAIT protocol, the stratified randomization by disease severity yields statistically valid comparisons. A trial of glucosamine sulfate would be useful to definitively establish whether there is benefit.
Effectiveness and Harms in Patients With Primary OA. The best available evidence, a single placebo-controlled RCT, found arthroscopic lavage with or without debridement was not superior to placebo. The evidence base does not definitively show that arthroscopy is no more effective than placebo. However, additional high-quality RCTs would be necessary to refute the existing trial, which suggests equivalence between placebo and arthroscopy.
No other study besides Moseley, O'Malley, Petersen, et al. (2002) addressed the potential contribution of placebo effects to apparent improvement in outcome after arthroscopy. The primary limitations of the Moseley, O'Malley, Petersen, et al. (2002) trial are lack of details describing the patient sample, the use of a single surgeon, and enrollment of patients at a single Veterans Affairs Medical Center. These concerns call into question the generalizability of this trial's findings. Since OA of the knee affects a large population, uncertainty about arthroscopy's effectiveness should be resolved with further well-conducted and well-reported RCTs.
Major methodologic shortcomings in non-placebo RCTs, an administrative database analysis and case series preclude resolution of uncertainties raised by the trial of Moseley, O'Malley, Petersen, et al. (2002).
Evidence on the harms after arthroscopic lavage and debridement comes primarily from an administrative database analysis and case series reports. Potential harms include infection, prolonged drainage from arthroscopic portals, effusion, hemarthrosis and deep vein thrombosis. To determine whether the risk of such harms is acceptable, it is important to establish whether the effectiveness of arthroscopic lavage and debridement surpasses placebo.
Differences in Outcome Among Subpopulations. Subgroup analyses for mechanical symptoms, alignment and OA stage were performed in the Moseley placebo-controlled RCT. No differences in results were observed within subgroups. Subgroup analyses were also performed in a quasi-experimental study, an administrative database and several case series. In these studies, different outcomes were observed according to age, presence of mechanical symptoms, and severity of OA. However, since these studies did not include placebo controls, it cannot be concluded that arthroscopy has greater effectiveness in specific patient subgroups.
Effectiveness and Harms in Secondary OA of the Knee. We identified no studies that enrolled patients with only secondary OA of the knee, or that reported separately on secondary OA of the knee.
Comparison of Interventions. We did not find any direct comparative studies in which glucosamine, chondroitin, or glucosamine plus chondroitin were compared with arthroscopy or viscosupplementation to treat OA of the knee. A single, small, underpowered, poor quality trial found no difference in outcome measures comparing intra-articular hyaluronan to arthroscopy and debridement over a 1-year followup.
OA of the knee is a common condition and the three interventions reviewed in this report are widely used in the treatment of OA of the knee. Yet the best available evidence reports that glucosamine/chondroitin and arthroscopic surgery are no more effective than placebo. The Glucosamine/Chondroitin Arthritis Intervention Trial (GAIT) (n=1,583) found that neither glucosamine hydrochloride, chondroitin sulfate nor the combination was superior to placebo and that all were inferior to celecoxib. The double blind randomized controlled trial by Moseley, O'Malley, Petersen, et al. (2002, n=180) found that arthroscopic lavage with or without debridement was not superior to sham arthroscopy. Results from 42 RCTs, all but one of which were synthesized in various combinations in six meta-analyses, generally show positive effects of viscosupplementation on pain and function scores compared to placebo. However, the evidence on viscosupplementation is accompanied by considerable uncertainty due to variable trial quality, potential publication bias, and unclear clinical significance of the changes reported.
For viscosupplementation, higher-quality trials are in the minority and show smaller effects; there are numerous patients lost to follow-up, and a substantial portion of studies (25 percent of total patients) have not been published as full articles. The clinical significance of reported changes in pain and function scores is uncertain, as almost all studies compare only mean difference between arms. Although the overall pooled estimate suggests that hylan G-F 20 may have a larger effect than other hyaluronans, whether this represents a meaningful clinical effect or limitations in the quality and completeness of study reporting is unknown. A rigorous RCT that showed strong evidence of improvement in pain and function would be necessary to conclude that viscosupplementation is beneficial.
While the overall results of GAIT show no benefit, a subgroup analysis found that the combination of glucosamine hydrochloride and chondroitin sulfate significantly improved pain in patients with moderate-to-severe OA of the knee. Although this subgroup analysis was not explicitly prespecified in the GAIT protocol, the stratified randomization by disease severity yields statistically valid comparisons. The nonsignificant statistical result in the celecoxib arm in the same patient subgroup may be a function of insufficient power. Given the small number of patients in the moderate-to-severe subgroup, and the large number of such patients in the general population, a further trial can be justified. However, these subgroup results do not override the overall results of GAIT, which must stand unless confirmed in a rigorous RCT.
The existing evidence does not definitively show that arthroscopic lavage with or without debridement is no more effective than placebo. However, additional placebo-controlled RCTs showing clinically significant advantage for arthroscopy would be necessary to refute the Moseley results, which show equivalence between placebo and arthroscopy. The recently published Spine Patient Outcomes Research Trial (SPORT) offers an alternative study design that could be informative, a rigorous RCT comparing surgery to conservative management, rather than sham (Weinstein, Tosteson, Lurie, et al., 2006).
Overall, our recommendations for future research reach beyond the specific treatments addressed in this report, and are intended broadly to improve the quality of research and reporting on interventions for osteoarthritis of the knee. However, our population is aging, there is increasing prevalence of obesity, and increasing burden of knee osteoarthritis, together with inconsistent evidence regarding disease treatments. Given the public health impact, research on new approaches to prevention and treatment should be given high priority.
Minimally Clinically Important Improvement in Pain and Function Should be the Measure of Success for All Trials. Clinically meaningful results require outcome measures establishing that patients experience improvement that is important to them—meaningful clinically important improvement. The range of magnitude of improvement clinically important to patients has been estimated for VAS pain and WOMAC measures, while to a lesser degree for the Lequesne Index (see Methods). Common measures and intervals for measurement will produce a more robust body of cumulative evidence and improve the ability to compare and pool results among trials.
Unpublished Studies Should be Made Available as Full Text Publications. Among RCTs of viscosupplementation, those that have not been published in full text comprise approximately 25 percent of the total patient population. Several meta-analyses of glucosamine report that trials of the Rotta product, glucosamine sulfate, show outcomes superior to trials of glucosamine hydrochloride; yet key trials have not been published as full-text studies. Existing studies should be published in full. And all trials should be registered at inception at ClinicalTrials.gov along with anticipated date for full release of results.
The Pitfalls of Meta-Analysis Should be More Widely Recognized and Acknowledged. Our evidence report draws heavily on six study-level meta-analyses of glucosamine/chondroitin and five of viscosupplementation. While we used a validated instrument to appraise the quality of the systematic reviews, the instrument does not address the question of when meta-analysis is appropriate to a systematic review. Meta-analysis is a technique with underlying assumptions that may or may not hold when a particular collection of results are pooled. Furthermore, meta-analyses may fail to convey the real uncertainty and potential bias accompanying pooled estimates.
Uncertainty in the magnitude of effects pooled is influenced by factors intrinsic to the underlying trials. Among these are variable patient characteristics, trial characteristics, and the indication that a few trial results were outliers and influential on pooled estimates. The meta-analyses frequently reported high inter-trial heterogeneity. Random effects models were used in the face of high heterogeneity, but a consequence is to increase the influence of smaller trials on the pooled results. The meta-analyses did not address a threshold question, one that has not been clearly resolved by practitioners of meta-analysis: when is heterogeneity too high to justify pooling trial results. A related concern is the practice of reporting on multiple outcome measures and time intervals, which may be represented by a small portion of studies, thus potentially introducing bias.
Osteoarthritis of the knee is a common condition. The three interventions reviewed in this report are widely used in the treatment of OA of the knee, yet the best available evidence does not clearly demonstrate clinical benefit. Uncertainty over clinical benefit can be resolved only by rigorous, multicenter RCTs. In addition, given the public health impact of OA of the knee, research on new approaches to prevention and treatment should be given high priority.
This is a systematic review of three treatments for osteoarthritis (OA) of the knee: intra-articular injections of viscosupplements; oral glucosamine, chondroitin or the combination; and, arthroscopic lavage and debridement. The key questions are: (1) effectiveness and harms in primary OA of the knee, (2) in secondary OA of the knee, (3) in subpopulations, and (4) comparison of the three interventions. This section outlines the burden of illness and clinical management of osteoarthritis of the knee, the interventions of interest and uncertainties, and overviews key questions to be addressed.
According to the U.S. Centers for Disease Control and Prevention (CDC), an estimated 22 percent of adults (46 million) in the United States have doctor-diagnosed arthritis (Centers for Disease Control and Prevention, 2006). Earlier figures suggest approximately 11 percent of the population 64 years and older has symptomatic OA of the knee (Manek and Lane, 2000). Symptoms of OA typically begin after age 40 and progress slowly, with radiographic evidence of the disease present in the majority of the population by 65 years of age and in approximately 80 percent of the population age 75 years and older. OA of the knee is more common in women than in men, with risk factors that include obesity, previous knee injury or surgery, and occupational bending and lifting (Felson, 2006; Centers for Disease Control and Prevention, 2005).
Loss of joint function as a result of OA overall is a major cause of work disability and reduced quality of life. The CDC estimates that osteoarthritis and related arthritic conditions cost the U.S. economy nearly $81 billion per year in direct medical care, with indirect expenses of about $47 billion that include lost wages and production (Centers for Disease Control and Prevention, 2004). CDC figures further estimate the total annual direct cost of OA and related conditions per person is approximately $1,752.
The term “osteoarthritis” refers to a heterogeneous group of joint disorders, usually signaled by symptoms of pain and stiffness. It involves both destructive and reparative metabolic processes, with a variety of biochemical triggers in addition to mechanical injury of the joint (Mandelbaum and Waddell, 2005). It is thought that inflammation does not play a primary role in osteoarthritis, although it may be present. When inflammation occurs, it is generally mild (Hochberg, Altman, Brandt, et al., 1995b). The pathogenesis of OA is not fully understood, although multiple contributing factors are recognized including genetic, environmental, metabolic, and biomechanical factors (Kraus, 1997).
Although OA eventually involves all joint structures, it begins with damage and progressive degradation of articular hyaline cartilage structure and function (chondropenia), typically in a nonuniform, focal manner (Felson, 2006). As chondropenia progresses in localized areas, stress increases across the entire joint, further damaging and eroding cartilage. In areas with full-thickness cartilage loss, abnormal remodeling and attrition of subarticular bone commences, typically accompanied by growth of osteophytes. Synovitis, ligamentous laxity, and periarticular muscle weakness may also occur, eventually leading to joint tilting and malalignment. Malalignment is a risk factor for joint failure, hastening structural deterioration of the joint by increasing local loading forces.
The symptoms of OA result from abnormal stresses on the weight-bearing joints or normal stresses on weakened joints, becoming progressively worse and more frequent with age. The typical joints involved with osteoarthritis include the large, weight-bearing joints such as the hip and knee, as well as selected smaller joints in the hands, feet, and spine.
Osteoarthritis may be broadly categorized as primary (idiopathic) or secondary. According to the American Academy of Orthopaedic Surgeons, primary OA of the knee can be defined as a process in which articular degeneration occurs in the absence of an obvious underlying abnormality (American Academy of Orthopaedic Surgeons, 2004). Secondary OA of the knee is often the result of injury (trauma) or repetitive motion such as found in certain occupations. It can also result from congenital conditions and underlying diseases, including include systemic metabolic diseases, endocrine diseases, bone dysplasias, and calcium crystal deposition diseases. Secondary OA is more likely to manifest itself at an earlier age than primary OA, and may be an initial clue to the presence of a potentially dangerous and treatable systemic disease. While there is rationale for identifying two separate categories of OA, making a distinction between them does not alter clinical practice and therapeutic choices.
The diagnosis of osteoarthritis is established using a combination of clinical information derived from history, physical examination, radiologic, and laboratory evaluation. An algorithm of diagnostic criteria for osteoarthritis of the knee has been proposed by the American College of Rheumatology (ACR) (Altman, Asch, Bloch, et al., 1986). A diagnosis of OA of the knee is defined as presenting with pain, and meeting at least five of the following criteria:
Patient older than 50 years of age
Less than 30 minutes of morning stiffness
Crepitus (noisy, grating sound) on active motion
Bony tenderness
Bony enlargement
No palpable warmth of synovium
Erythrocyte sedimentation rate (ESR) <40 mm/hr
Rheumatoid factor <1:40
Noninflammatory synovial fluid.
The presence of clinical symptoms of OA does not always correlate well with the degree of abnormality seen on radiographs. It has been noted that approximately 40 percent of patients who have severe X-ray findings report no symptoms, and conversely, patients with clinical symptoms may show no significant radiological changes (Balint and Szebenyi, 1996; Davis, Ettinger, Neuhaus, et al., 1992; Claessens, Schouten, van den Ouweland, et al., 1990).
Treatment for OA of the knee aims to alleviate pain and improve function in order to mitigate reduction in activity (American College of Rheumatology, 2000; Felson, 2006). However, most treatments do not modify the natural history or progression of OA, and thus are not considered curative. Nonsurgical modalities include education, exercise, weight loss, and various supportive devices; acetaminophen or nonsteroidal anti-inflammatory drugs (NSAIDs) such as ibuprofen; nutritional supplements (glucosamine and chondroitin); and, intra-articular viscosupplements.
Guidelines for the medical management of osteoarthritis emphasize the role of both nonpharmacologic and pharmacologic therapies (American College of Rheumatology, 2000; Jordan, Arden, Doherty, et al., 2003). Initial management involves nonpharmacologic therapies, including education, exercise, various appliances and braces, and weight reduction. Acetaminophen is recommended as first-line pharmacologic therapy. If pain relief is inadequate with acetaminophen, analgesic-dose NSAIDs may be used (e.g., ibuprofen, naproxen). If symptom response to a lower NSAID dosage is inadequate, higher, anti-inflammatory, doses may be used. Intra-articular corticosteroid injection may be considered when relief from NSAIDs is insufficient or the patient is at risk from gastrointestinal adverse effects. Injection of corticosteroids is frequently limited to three to four times per year per joint because of concern about the possibility of progressive cartilage damage through repeated injection in the weight-bearing joints (Neustadt, 1992).
If symptom relief is inadequate with conservative measures, invasive treatments may be considered. Operative treatments for symptomatic OA of the knee include arthroscopic lavage and cartilage debridement, osteotomy, and, ultimately, total joint arthroplasty (Day, 2005). Surgical procedures intended to repair or restore articular cartilage in the knee, including abrasion arthroplasty, microfracture techniques, autologous chondrocyte implantation, and others, are appropriate only for younger patients with focal cartilage defects secondary to injury (Clarke and Scott, 2003).
| Product | Regarding Treatment Course | Regarding Minimum # of Injections | Regarding Other Joints | Regarding Repeat Treatments |
|---|---|---|---|---|
| Hyalgan® (sodium hyaluronate); Fidia Pharmaceutical | “A treatment cycle consists of five injections given at weekly intervals. Some patients may experience benefit with three injections given at weekly intervals.” | “The effectiveness of a single treatment cycle of less than 3 injections has not been established.” | “The safety and effectiveness of the use of Hyalgan® in joints other than the knee have not been established.” | “Adverse experience data from the literature contain no evidence of increased risk relating to retreatment with Hyalgan®. The frequency and severity of adverse events occurring during repeat treatment cycles did not increase over that reported for a single treatment cycle....” |
| Original PMA date: 5/28/97 | Hyalgan® is the only hyaluronan with demonstrated safety in a 30-month, repeat use, open-label trial in which 75 patients received a cycle of 5 weekly injections of Hyalgan® every 6 months. | |||
| MW: 0.5–0.73 million Da | ||||
| Synvisc® (hylan G-F 20); Genzyme Corporation | “Synvisc® is administered by intraarticular injection once a week (one week apart) for a total of three injections.” | “The effectiveness of a single treatment cycle of less than three injections of Synvisc® has not been established.” | “The safety and effectiveness of Synvisc® in locations other than the knee and for conditions other than osteoarthritis have not been established.” | “The reactions seemed to occur more often when Synvisc® was injected into the knee as a repeat set of injections than when Synvisc® was injected as a first set of injections.” |
| Original PMA date: 8/08/97 | ||||
| MW: 6 million Da (hylan A) | ||||
| Supartz® (sodium hyaluronate); Seikagaku Corporation | “Supartz® is administered by intraarticular injection once a week (one week apart) for a total of 5 injections.” | “The effectiveness of a single treatment cycle of less than 5 injections has not been established.” | “The safety and effectiveness of the use of Supartz® in joints other than the knee have not been established.” | “The safety and effectiveness of repeat treatment cycles of Supartz® have not been established.” |
| Original PMA date: 1/24/01 | ||||
| MW: 0.62–1.17 million Da | ||||
| Orthovisc® (sodium hyaluronate), Anika Therapeutics, Inc. | “Orthovisc® is injected into the knee joint in a series of intraarticular injections one week apart for a total of three or four injections.” | The effectiveness of a single treatment cycle of less than 3 injections has not been established. Pain relief may not be seen until after the third injection. | “The safety and effectiveness of the use of Orthovisc® in joints other than the knee have not been established.” | “The effectiveness has not been established for more than one course of treatment.” |
| Original PMA date: 2/04/04 | ||||
| MW: 1–2.9 million Da | ||||
| Euflexxa® (sodium hyaluronate), Ferring Pharmaceuticals | “A dose of 2 ml is injected intraarticularly into the affected knee at weekly intervals for three weeks, for a total of three injections.” | N/R | “Safety and effectiveness of injection in conjunction with other intraarticular injectables, or into joints other than the knee has not been studied.” | “The safety and effectiveness of repeated treatment cycles of EUFLEXXA™ have not been established.” |
| Original PMA date: (approved under the name Nuflexxa) | ||||
| MW: 2.4–3.6 million Da | ||||
Da: Daltons; MW: molecular weight; PMA: premarket approval
Glucosamine and Chondroitin. Glucosamine is an aminomonosaccharide which is the principal component of O-linked and N-linked glycosaminoglycans, which comprise the matrix of all connective tissues, including cartilage (Biggee and McAlindon, 2004; Matheson and Perry, 2003; Hauselmann, 2001; Deal and Moskowitz, 1999). This compound historically has been derived by extraction of chitin, a component of crustacean shells, although is also is produced through fermentation of a vegetarian source. Chondroitin sulfate is a glycosaminoglycan with a polymerized disaccharide base linked to a sulfate moiety, and is a component of proteoglycans of articular cartilage. It is usually derived from bovine trachea, although other sources such as ovine or porcine trachea and shark cartilage are used. The mechanisms of action of these compounds are unknown, but it is speculated they may promote maintenance and repair of cartilage.
In the United States, glucosamine hydrochloride or sulfate and chondroitin sulfate are considered dietary supplements available in over-the-counter (OTC) products, which may vary substantially in content and purity from what is stated on the label (McAlindon, 2003). In European Union countries, glucosamine sulfate and chondroitin sulfate are regulated as prescription drugs. A number of clinical trials with positive outcomes either used glucosamine sulfate manufactured by an Italian firm, Rotta Research Laboratorium, or were financially supported by Rotta. It has been hypothesized that Rotta glucosamine sulfate has greater efficacy than the hydrochloride salt, and that the formulation is a key factor in trial outcome (Altman, Abramson, Bruyere, et al., 2006; Hochberg, 2006; McAlindon, 2003). Oral administration of glucosamine sulfate can increase serum and synovial fluid sulfate levels, whereas sodium sulfate does not. Absorbed sulfate is then used in the synthesis of proteoglycans and metabolic intermediates like coenzyme A and glutathione that are important for chondrocyte metabolism.
Arthroscopy. The term “arthroscopy” is often used collectively in reference to individual minimally invasive surgical procedures, joint lavage and articular debridement, which are performed using fine needles and an arthroscope (Gidwani and Fairbank, 2004; Gunther, 2001). Arthroscopic lavage is a palliative measure in which intra-articular fluid is aspirated and the joint is washed out, removing inflammatory mediators, debris, or small loose bodies from the osteoarthritic knee. Articular debridement involves removal of cartilage or meniscal fragments, but also can include cartilage abrasion, excision of osteophytes and synovectomy. Debridement is intended to improve symptoms and joint function in patients with mechanical symptoms such as locking or catching of the knee. Because lavage and debridement are often performed at the same time, it is difficult to attribute the success or failure of arthroscopy to a specific procedure.
This systematic review of the literature will address the following questions regarding managing patients with OA of the knee with three interventions: intra-articular injections of viscosupplements; oral glucosamine and chondroitin; and, arthroscopic lavage and debridement.
What are the clinical effectiveness and harms of each intervention in patients with primary OA of the knee?
What are the clinical effectiveness and harms of each intervention in patients with secondary OA of the knee?
How do the short-term and long-term outcomes of each intervention differ by the following subpopulations: age, race/ethnicity, gender, primary or secondary OA, disease severity and duration, weight (body mass index), and prior treatments?
How do the short-term and long-term outcomes of each intervention compare for the treatment of primary OA of the knee; and secondary OA of the knee?
This report is a systematic review of the effectiveness of three technologies to treat osteoarthritis (OA) of the knee: intra-articular hyaluronan injections (viscosupplements), enteral glucosamine and chondroitin given alone or in combination, and arthroscopic lavage and debridement. This chapter describes the search strategies used to identify literature; criteria and methods used for selecting eligible articles; methods for data abstraction; methods for quality assessment; and, finally, the process for technical expert advice and peer review.
The methods of this review are generally applicable to all Key Questions. However, as noted, there were variations in specific aspects of the methods as necessary to satisfy requirements of each question.
A technical expert panel provided consultation for the systematic review and reviewed the draft report. The draft report was also reviewed by 12 external reviewers, including invited clinical experts and stakeholders (Appendix D *). Revisions were made to the draft report based on reviewers' comments.
This Evidence Report takes a tiered approach to evidence of the effectiveness of the three key interventions. The primary focus is on whether interventions have beneficial effects exceeding those of a comparative placebo. We first determined whether existing systematic reviews and meta-analyses adequately addressed the Key Questions and whether they identified all relevant primary studies. If additional primary studies are found, this Evidence Report integrates their findings with systematic reviews and meta-analyses. If evidence from randomized, placebo-controlled trials (RCTs) clearly shows benefits beyond placebo, then comparisons between these interventions and other active interventions would be relevant.
The diagram in Figure 1
We sought systematic reviews, meta-analyses, RCTs, including abstracts of unpublished placebo-controlled RCTs, examining the clinical effectiveness of one or more of the interventions of interest among patients with primary or secondary OA of the knee; and reporting at least one outcome of interest.
RCTs had to be published either as articles in any language or English-language abstracts (if the study was only presented as an abstract). No minimum number of patients per study arm was required for RCTs. Because there were few RCTs available to address arthroscopy and Key Question 4 (comparative outcomes), we sought additional study designs. For arthroscopy, we also sought English-language articles of nonrandomized comparative trials (i.e., quasi-experimental studies), administrative database analyses, and case series with samples of 50 or more. For Key Question 4, we sought randomized and nonrandomized comparative studies.
Studies were excluded if no outcome of interest to this review was reported. Studies were also excluded if the patient population of interest was fewer than 80 percent of included patients, or, alternatively, results for the patient population of interest were not separately reported. When multiple reports were available for the same study, it was counted as a single trial and outcome data from the report with the longest followup were used.
The populations of interest are patients with primary or secondary OA of the knee, as defined by the American Academy of Orthopaedic Surgeons (American Academy of Orthopaedic Surgeons, 2004):
Primary osteoarthritis of the knee is a process in which articular degeneration occurs in the absence of any obvious underlying abnormality (unknown cause); and
Secondary OA is often the result of injury (trauma) or repetitive motion in certain occupations, but it can also result from congenital conditions and systemic metabolic diseases, endocrine diseases, bone dysplasias, and calcium crystal deposition diseases.
Subpopulations of interest include: age, race or ethnicity, sex, disease severity and duration, weight (body mass index), and prior treatments
Enteral (i.e., orally administered) glucosamine (sulfate or hydrochloride) given alone
Enteral chondroitin given alone
Enteral glucosamine and chondroitin given in combination.
Glucosamine is given orally at 1,500 mg daily, usually as a single dose, or divided into two or three doses. Chondroitin is administered orally, usually a total of 800 to 1,200 mg daily, or in divided doses. At minimum, treatment duration is 1 to 3 months, and may be continued indefinitely if the patient experiences improvement.
Intra-Articular Injections Hyaluronan Preparations. The first group of products, derived from sodium hyaluronate, is the most commonly used viscosupplement in RCTs and is followed by hylan G-F 20 as the next most common class. Additionally, unapproved non-animal stabilized hyaluronic acid (NASHA) derived from streptococci has been used in two RCTs (Altman, Akermark, Beaulieu, et al., 2004; Pham, Le Henanff, Ravaud, et al., 2004). One trial (Petrella, DiSilvestro, Hildebrand, et al., 2002) administered a hyaluronan that is not approved by the U.S. Food and Drug Administration (FDA). Intra-articular injections performed in RCT protocols were most often weekly for 3 to 5 weeks, although different schedules also were used.
Arthroscopy. Studies were selected if arthroscopic treatment of OA involved lavage with or without debridement, and debridement was not specifically required to include procedures beyond nonabrasion chondroplasty and removal of loose bodies. Thus, studies were excluded if they focused only on arthroscopic meniscectomy or abrasion chondroplasty, for example.
Primary Outcomes. The primary outcomes of interest are:
Pain severity or intensity
Self-reported physical function
Patient global assessment
Quality of life.
Secondary Outcomes. Secondary outcomes of interest include:
Need for or time to total knee replacement or other surgeries.
Concomitant analgesic use.
Harms or Adverse Effects. Any adverse events reported, including:
Hyaluron Preparations. Local: injection site redness, edema, pain, joint swelling, joint stiffness, worsened osteoarthritis, infection, pseudoseptic reactions. Systemic: severe acute inflammatory reaction or pseudosepsis, anaphylaxis, arthralgias, rash, urticaria, back pain, headache.
Glucosamine and Chondroitin. Alterations in blood glucose, hypersensitivity reactions, and local gastrointestinal toxicities.
Arthroscopy. Infection, prolonged drainage from arthroscopic portals, effusion, hemarthrosis and deep vein thrombosis.
Instruments. Pain and function should be measured by instruments with established validity and reliability. Although results are frequently reported as mean change in the intervention compared to control arms, this is not the preferred method of measuring outcomes. More informative, is a comparison of response, that is the proportion of patients achieving an improvement that is established representing a minimum clinically important improvement. (Tubach, Wells, Ravaud, et al., 2005).
Among established instruments, pain severity may be assessed by a visual analog scale (VAS) or a numeric rating scale (NRS) or from a subscale included in a knee-specific validated OA instrument. The horizontal 100-mm VAS has a left-hand or 0-mm endpoint labeled “no pain” and a right-hand or 100-mm endpoint usually labeled with a statement such as “extreme pain” or “pain as bad as it could possibly be.” While the amount of improvement required may not be definitively established (Tubach, Ravaud, Baron et al. 2005; Pham, van der Heijde, Altman, et al. 2004), the best available estimates for OA of the knee are between 20 and 40 percent improvements have been used in hyaluronan and glucosamine/chondroitin trials (Nuestadt et al. 2005, Altman et al. 2004, Clegg et al). A clinically significant change in VAS score depends on the baseline pain (Campbell and Patterson, 1998). For example, in knee OA an absolute 20 mm or 40 percent relative reduction in VAS pain score could be considered a minimal clinically important improvement (MCII) (Tubach, Wells, Ravaud, et al., 2005) and define clinically meaningful response. Accordingly, a decrease of 10–12 mm may be clinically significant from a baseline of 25 mm, while a reduction of 20–31 mm may be necessary to achieve a clinically significant reduction for patients with high baseline pain (e.g., VAS 75–100 mm).
Among 2 widely used OA instruments, the Western Ontario and McMaster University Osteoarthritis Index (WOMAC, McConnell, Kolopack, and Davis, 2001; Bellamy, Buchanan, Goldsmith, et al., 1988) evaluates 3 dimensions, pain, stiffness, and physical function with 5, 2, and 17 questions, respectively. WOMAC assesses pain using either the sum of scores from 5 items or the VAS. WOMAC outcomes can be based on the total, or a subset score. A 20- to 40-percent reduction in the WOMAC pain subscore is a positive response criterion for pain used in knee OA studies and represents achieving a MCII (Tubach, Wells, Ravaud, et al., 2005).
Another commonly used OA instrument is the Lequesne Index, a validated numerical scale in which points are assessed for various levels of pain, distance walking, and ability to perform activities of daily living (Lequesne, Mery, Samson, et al., 1987). It sums scores from 5 adjectival items, producing scores ranging from 1 to 24 points. The severity of handicap related to the knee can be categorized by point score: mild (1–4 points); moderate (5–7 points); severe (8–10 points); very severe (11–13 points); and extremely severe (>14 points) (Bellamy, 1993). What constitutes a MCII is likely approximately 20 percent (Bellamy, 1993).
Physical function may be appraised through reported difficulty performing specific daily activities affected by knee OA (Bellamy, Buchanan, Goldsmith, et al., 1988; Lequesne, Mery, Samson, et al., 1987). Patient global assessment (generally defined as the “patient's assessment of overall disease activity or improvement”) can be assessed by VAS, NRS, or other specific instruments (Pham, van der Heijde, Altman, et al. 2004). The MCII for patient global assessment on a 100 mm VAS has been suggested to be 18 mm, or a relative improvement of 40 percent.
Both generic measures and disease-specific quality of life (QOL) measures may be relevant (Salaffi, Carotti, and Grassi, 2005) assessing disease impact. The SF-36 and Arthritis Impact Measurement Scales (Meenan, 1986) are acceptable scales to assess the impact of osteoarthritis on QOL.
Pooled Outcome Measures. Meta-analyses may pool outcome measures using the metric of the original scale, or a metric related to it.
The “weighted mean difference” (WMD) combines (pools) differences between treatment and control from multiple trials on the scale of the original instrument. It can be reported as either a difference between treatment and control at some followup time or a difference in change scores. While intuitive to interpret as a difference or difference in change for some outcome measure, the WMD has doe not define proportions achieving a MCII or response (Senn 1997, page 226; Tubach, Ravaud, Giraudeau 2005).
“Relative risks” (or the approximately equal odds ratio) can be pooled for dichotomous outcome measures (e.g., patient global assessment and adverse events). It is a ratio comparing the outcome probability among treated compared placebo groups. The relative risk clearly conveys increased risk, but does not directly reflect clinical benefit in terms of response unless a comparison of meaningful clinical response rates.
“Sums of differences” in outcome measures between treatment and placebo groups (e.g., pain and function) over the course of a study can also be pooled. The measure is expressed as a percentage reflecting how much greater relief is provided by treatment compared to placebo. Although commonly used in pain research, the measure does not have direct clinical meaning with respect to response.
“Standardized effect sizes” expressed as differences or differences in change, standardized by their variability (divided by the standard deviation) can also be pooled. Standardized effect sizes are typically used when scales pooled have different metrics (e.g., a 0- to 100-mm VAS and a 25-point WOMAC scale). The clinical meaning of standardized effect sizes when different scales are pooled and variability differs across studies is difficult to intuit. While small, medium, and large referents corresponding to 0.3, 0.5 and 0.8, respectively, were suggested by Cohen (1988), they pertain to sample size calculations not clinical meaning, and were qualified substantially.* Others have pointed out problematic aspects of standardized effect sizes including: incomparability across studies (Rothman and Greenland, 1998) and that studies with identical results may appear to differ (Greenland, Schlesselman, Criqui, 1986). Most importantly, one cannot infer individual response Senn (1997).†
Electronic Databases. The following databases were searched for citations. The full search strategy is displayed in Appendix A *. The search was not limited to English-language references; however, foreign-language references without abstracts were disregarded.
MEDLINE® (through March 29, 2007)
EMBASE (through March 16, 2006)
Cochrane Controlled Trials Register (through November 27, 2006).
EMBASE was updated with abbreviated searches through November 27, 2006.
Additional Sources of Evidence. The Technical Expert Panel and individuals and organizations providing peer review were asked to inform the project team of any studies relevant to the key questions that were not included in the draft list of selected studies.
We examined the bibliographies of all retrieved articles for citations to any RCT that was missed in the database searches. In addition, we sought RCTs published in conference proceedings and abstracts from the American Association of Orthopaedic Surgeons (AAOS), American College of Rheumatology (ACR) and the Osteoarthritis Research Society International (OARSI) over the past 2 years. We also consulted product inserts of U.S.-marketed viscosupplement products.
Search results were stored in a ProCite® database. Using the study selection criteria for screening titles and abstracts, a single reviewer marked each citation as either: (1) eligible for review as full-text articles, (2) ineligible for full-text review, or (3) uncertain. Citations marked as uncertain were reviewed by a second reviewer and resolved by consensus opinion, with a third reviewer to be consulted if necessary. Using the final study selection criteria, review of full-text articles was conducted in the same fashion to determine inclusion in the systematic review. Of 1,842 citations, 451 articles were retrieved and 98 selected for inclusion (Figure 2
The data elements below were abstracted, or recorded as not reported, from intervention studies. Data elements to be abstracted were defined in consultation with the Technical Expert Panel.
Data elements from intervention studies (RCTs and quasi-experimental studies) include:
Critical features of the study design (for example, patient inclusion/exclusion criteria, number of participants, allocation method (including concealment), use of blinding)
Patient characteristics (age, gender, race/ethnicity, body weight, primary or secondary disease. disease duration)
Measures of disease severity
Treatment protocols (for example, dose, frequency, duration, extent of arthroscopic surgery, other prior and concurrent treatments)
Patient monitoring procedures (for example, followup duration and frequency, outcome assessment methods) and
The specified key outcomes and data analysis methods
Results
Funding source.
Data elements from systematic reviews and meta-analyses include:
Use of a protocol
The study question (patients, interventions/comparisons, outcomes)
Literature search strategy
Study inclusion/exclusion criteria
Data extraction methods
Assessment of study quality
Methods of data synthesis/analysis
Funding source.
Data elements from case series include:
Clinical question
Enrollment of patients (consecutive or otherwise)
Whether a single-center or multicenter study
Patient selection criteria and sample characteristics
Intervention
Length of followup
Validated outcome measures and independence or blinding of outcome assessment
Statistical analyses
Results.
Templates for evidence tables were created in Microsoft Excel® and Microsoft Word®. One reviewer performed primary data abstraction of all data elements into the evidence tables, and a second reviewer reviewed articles and evidence tables for accuracy. Disagreements were resolved by discussion, and if necessary, by consultation with a third reviewer. When small differences occurred in quantitative estimates of data from published figures, the values obtained by the two reviewers were averaged.
In consultation with the AHRQ Task Order Officer and Technical Expert Panel, the general approach to grading evidence developed by the U.S. Preventive Services Task Force (Harris, Helfand, Woolf, et al. 2001) were applied to primary studies. The quality of the abstracted studies was assessed by two independent reviewers. Discordant quality assessments were resolved with input from a third reviewer, if necessary.
The quality of RCTs and quasi-experimental studies were assessed on the basis of the following criteria:
Initial assembly of comparable groups: adequate randomization, including concealment and whether potential confounders (e.g., other concomitant care) were distributed equally among groups
Maintenance of comparable groups (includes attrition, crossovers, adherence, contamination)
Important differential loss to followup or overall high loss to followup
Measurements: equal, reliable, and valid (includes masking of outcome assessment)
Clear definition of interventions
All important outcomes considered
Analysis: adjustment for potential confounders, intention-to-treat analysis.
Definition of Ratings Based on Above Criteria. The rating of intervention studies encompasses the three quality categories described here:
Good: Meets all criteria: Comparable groups are assembled initially and maintained throughout the study (followup at least 80 percent); reliable and valid measurement instruments are used and applied equally to the groups; interventions are spelled out clearly; all important outcomes are considered; and appropriate attention is given to confounders in analysis. In addition, for RCTs, intention-to-treat analysis is used.
Fair: Studies were graded “fair” if any or all of the following problems occur, without the fatal flaws noted in the “poor” category below: In general, comparable groups are assembled initially but some question remains whether some (although not major) differences occurred with followup; measurement instruments are acceptable (although not the best) and generally applied equally; some but not all important outcomes are considered; and some but not all potential confounders are accounted for. Intention-to-treat analysis is done for RCTs.
Poor: Studies were graded “poor” if any of the following fatal flaws exists: Groups assembled initially are not close to being comparable or maintained throughout the study; unreliable or invalid measurement instruments are used or not applied at all equally among groups (including not masking outcome assessment); and key confounders are given little or no attention. For RCTs, intention-to-treat analysis is lacking.
Assessment of the quality of systematic reviews and meta-analyses were guided by a quality rating method reported by Oxman and Guyatt (1991; Overview Quality Assessment Questionnaire).* Oxman and Guyatt tool results in a quality score, based on the answers to ten questions that provide information on the content of a review in terms of how it was conducted, as follows:
Were the search methods used to find evidence on the primary question(s) stated?
Was the search for evidence reasonably comprehensive?
Were the criteria used for deciding which studies to include in the overview reported?
Was bias in the selection of studies avoided?
Were the criteria used for assessing the validity of the included studies reported?
Was the validity of all the studies referred to in the text assessed using appropriate criteria?
Were the methods used to combine the findings of the relevant (to reach a conclusion) reported?*
Were the findings of the relevant studies combined appropriately relative to the primary question of the overview?
Were the conclusions made by the author(s) supported by the data and/or analysis reported in the overview?
What was the overall scientific quality of the overview? Use the following scoring scale:
The following guidelines are used to apply the Oxman and Guyatt rating:
| Question 1: | Literal interpretation. |
| Question 2: | For a search to be considered comprehensive the methods used to perform the search should include searching for unpublished material as well as multiple medical databases (at least EMBASE and MEDLINE®). If only published material was searched for, the search should be marked “partially.” A look through bibliographies, conference proceedings, or trial registries is deemed adequate as a search for unpublished literature. The search must not be limited to the English language. |
| Question 3: | Should specify defining population, intervention, principal outcomes, and study design to be “yes;” if only 2 or 3 of these are noted, it should be scored “partially” here. |
| Question 4: | Must be “yes” on 2 and 3 and dual review to be “yes” here; if “no” on 2 or 3 must be “no” here; if “partially” or “can't tell” on 2 and 3 then must be the same here. |
| Question 5: | Must use some cited validity tool for “yes” here. |
| Question 6: | Scales used must be appropriately applied to study type for “yes” here. |
| Question 7: | An appropriate pooling method and test for heterogeneity must be described for “yes” here; were “partially” if a pooling method but no heterogeneity testing method is specified. |
| Question 8: | If no attempt has been made to combine findings, and no statement is made regarding the inappropriateness of combining findings, check “no.” If a summary (general) estimate is given anywhere in the abstract, the discussion, or the summary section of the paper, and it is not reported how that estimate was derived, mark “no,” even if there is a statement regarding the limitations of combining the findings of the studies reviewed. If in doubt, mark “can't tell.” To determine whether it is appropriate to use random or fixed effects model, the study should address the question of how much heterogeneity would be considered (addressing clinical and statistical aspects of heterogeneity). |
| Question 9: | If 8 is “no,” 9 must be “no.” If 8 is “can't tell,” 9 must be “can't tell.” For an overview to be scored as “yes” on Question 9, data (not just citations) must be reported that support the main conclusions regarding the primary question(s) that the overview addresses. |
| Question 10: | The overall scientific quality should be based on the answers to the first 9 questions. The following guidelines can be used to assist with deriving a summary score: if the “can't tell” option is used one or more times on the preceding questions, a review is likely to have minor flaws at best, and it is difficult to rule out major flows (i.e., a score ≥4). If the “no” option is used on Questions 2, 4, 6, or 8, the review is likely to have major flaws (i.e., a score of ≥3, depending on the number and degree of the flaws). |
It should be noted that a new quality assessment tool for systematic reviews and meta-analyses was recently developed (Shea, Grimshaw, Wells, et al., 2007). It was based, in part, on the work of Oxman and Guyatt, but differs in significant ways. In particular, the Oxman and Guyatt tool does not adequately address whether quality concerns of the underlying literature were incorporated into conclusions. The tool by Shea, Grimshaw, Wells, et al. (2007) more clearly assesses whether conclusions took appropriate account of the quality of included studies and the potential for publication bias. The recently developed tool was unavailable during the time when ratings of meta-analyses were performed for this evidence report.
| Clearly Defined Question | Well-Described Study Population* | Well-Described Intervention | Use of Validated Outcome Measures | Appropriate Statistical Analysis | Well-Described Results | Discussion/Conclusions Supported by Data | Funding/Sponsorship Source Acknowledged |
|---|---|---|---|---|---|---|---|
| Question should be appropriate to study design; | Case definition (diagnostic criteria); type of criteria (clinical, radiographic); whether criteria used before (reference); explicit inclusion/exclusion criteria; | Sufficiently clear that another center could replicate study; if not identified in detail, should provide references; | Reference to previous validation; | Statistical tests and power calculations aimed at improvement over time; prepost analysis should take into account paired nature of data; | Utilize only validated outcome measures; | Conclusion should be supported by the data in the article | Funding source should be disclosed in addition to consulting or board relationship with manufacturer |
| should not be stated in terms of effectiveness; | includes standard information (age; sex; socioeconomic status; stage and duration of disease; comorbidities; n; time to accrual; exclusions and reasons; loss to followup; refusal) | cointerventions should be described in reasonable detail | ideally individual assessing patient's outcome should be masked to specific intervention; alternatively, assessor who is not in direct employ of clinical office; | comparisons with historical controls should take into account differences in cointerventions between time periods; | description of adequacy of followup (number lost to followup, number who switch to another provider or pursue other treatments, number who die from other causes); | where other information is used to buttress conclusions, should be explicitly stated and referenced; | |
| best when focused; | standardized length and intervals of observation and of sufficient duration to be clinically meaningful; justification for the duration of followup | attention to nonspecific effects and inability to distinguish procedure's effect from spontaneous improvement; | [adaptation: inclusion of both potentially beneficial outcomes (symptom/function/quality of life) and adverse events] | limitations should be made explicit; | |||
| avoids over-reliance on those variables showing improvement; | description of specific next research steps (e.g., need for RCT, details of RCT) [adaptation: this element disregarded] | ||||||
| analysis should address multiple comparisons |
OA criteria noted; minimum set of characteristics: age, sex, disease duration and preop severity described.
Clearly defined question
Well-described study population
Well-described intervention
Use of validated outcome measures
Appropriate statistical analyses
Well-described results
Discussion and conclusion supported by data
Funding source acknowledged.
Five study-level meta-analyses comparing intra-articular hyaluronans with placebo (e.g., arthrocentesis and saline injection) for osteoarthritis (OA) of the knee have been published. One patient-level meta-analysis of a single product was also identified.* The quality of the meta-analyses was appraised with a validated tool (Oxman and Guyatt, 1991; Oxman, Guyatt, Singer, et al., 1991)—the Overview Quality Assessment Questionnaire.
These meta-analyses included outcome measures from 41 relevant randomized, controlled trials (RCTs). One additional placebo-controlled trial (Rolf, Engstrom, Ohrvik, et al., 2005) identified by our literature search† was not included in any meta-analysis (42 trials, therefore, included in this review). RCTs pooled by the meta-analyses overlap considerably; their quantitative results and limitations also overlapped. Owing to the broad scope of the meta-analyses, they were judged to effectively capture existing evidence and formed the primary basis for evaluating hyaluronans' effectiveness. Important details relevant to the evidence, or inconsistently reported in the meta-analyses, were abstracted from the primary literature (e.g., sample size and power calculations, use of intention-to-treat or per protocol analyses, industry involvement, quality appraised according to our protocol).
Outline. Because this chapter reports results from different perspectives, its organizational structure is outlined to guide the reader:
Study populations included in RCTs comprising the meta-analyses described
Application of the Overview Quality Assessment Questionnaire to the five study-level meta-analyses
Relevant detailed results from the meta-analyses
Trials not pooled or included in the meta-analyses
Adverse events
Supplementary analyses performed by the Evidence-based Practice Center
Sensitivity analyses
Publication bias
Hylan G-F 20
Summary and appraisal.
| Classification and Grade | RCTs |
|---|---|
| Kellgren-Lawrence 0–4 | 1 |
| Kellgren-Lawrence 1–2 | 1 |
| Kellgren-Lawrence 1–3 | 1 |
| Kellgren-Lawrence 1–4 | 3 |
| Kellgren-Lawrence 2–3 | 5 |
| Kellgren-Lawrence 2–4 | 7 |
| Ahlback 0–3 | 1 |
| Ahlback 1–2 | 2 |
| Altman 1–3 | 1 |
| Larsen 1–4 | 1 |
| Larsen 2–4 | 1 |
| Unreported or Unavailable | 18 |
| Total | 42 |
Mean baseline pain measured by visual analog scale (VAS) with movement was reported 19 RCTs ranging from 44 to 79 mm in hyaluronan study arms and 42 to 80 mm among placebo study arms. The variability of the baseline pain measurements in trials spanned standard deviations from 5.5 to 31. When reported, mean disease duration varied from 1.2 to 22 years.
Patient samples included in RCTs were therefore heterogeneous with respect to age, sex, knee radiographic grade, and baseline pain, reflecting varied patient selection among RCTs.
Quality ratings according to our protocol for 37 evaluable RCTs were “good” for nine, “fair” for 16, and 12 rated “poor” (five were not evaluable).
Sample sizes ranged from 12 to 408 with a mean of 141 and median 102.
Power calculations were reported in 19 RCTs. Mean sample size in these RCTs was 204 compared to 60 for the 16 RCTs without those calculations in published manuscripts.
Trial duration ranged from 4 to 52 weeks with a mean of 23 and median 20 weeks; 11 were fewer than 10 weeks in duration.
Intention-to-treat results were the primary analytical results reported in 17 RCTs (40 percent); 16 (38 percent) reported per protocol analyses; the analytical approach was either unclear or not reported in 9 (21 percent)—e.g., some unpublished studies.
Losses to follow-up or drop-outs ranged from 0 to 50 percent with nine RCTs reporting 20 percent or greater loss to follow-up.
Blinding was reportedly double in 35 RCTs.
Reported industry involvement included funding of 23 RCTs, providing statistical analyses for eight, and in eight, an industry member was a co-author.
| Trial | Sample Size* | Result (+/-) | |
|---|---|---|---|
| Abstract only | Russel et al., 1992 | 210 | – |
| Moreland et al., 1993 | 94 | – | |
| Cohen et al., 1994 | 39 | ?‡ | |
| Guler et al., 1996 | 30 | + | |
| Tsai et al., 2003† | 200 | + | |
| Subtotal (% of Total) | 573 (9.8) | ||
| Unpublished | France, 1995 | 254 | – |
| U.K., 1996 | 231 | ? | |
| Hizmetli et al., 1999 | 50 | + | |
| OAK 9801 | 382 | ?§ | |
| Subtotal (% of Total) | 917 (15.7) | ||
| Published | All Participants (% of Total) | 4,353 (74.5) | |
| Total | 5,843 (100) | ||
Sample size reported here are patients (not knees) randomized.
Bellamy, Campbell, Robinson, et al. (2006) refer to as Lin 2004, “in-house publication”
As reported in Lo, LaValley, McAlindon, et al. (2003) 95% CI included unity; Wang, Chen, Huang, et al. (2004) suggested benefit; abstract notes no statistically significant difference at any time points for pain, WOMAC, or global assessment.
Results presumably negative given language in package insert (see footnote). Not mentioned by Bellamy, Campbell, Robinson, et al. (2006) who obtained a number of results from manufacturers.
In summary, there is variability in trial characteristics including study quality, sample size and power calculations, duration, use of intention-to-treat analysis, losses to follow-up, funding, and industry involvement. The known extent of unpublished data includes a large number of individuals. Results from at least one trial (OAK 9801) appear unreported in any form.
| Lo et al., 2003 | Wang et al., 2004 | Arrich et al., 2005 | Modawal et al., 2005 | Bellamy et al., 2006 | Strand et al., 2006 | |
|---|---|---|---|---|---|---|
| Pain | X | X | X | X | X | |
| Physical Function | X | X | X | |||
| Patient Global Assessment | X | |||||
| WOMAC (Composite) | X | |||||
| Lequesne Index (Composite) | X | X | ||||
Shaded boxes indicate included in a meta-analysis, bolded RCTs are unpublished, italicized RCTs are abstracts not subsequently published; † or ‡ represent abstract and subsequent publications; although listed twice for to reflect what was included in meta-analysis, they are the same studies and therefore included only once in the total.
Included for adverse events, but not in any pooled efficacy result.
Identified in search, but data “could not be used” for any outcome other than adverse events.
Included in systematic review, but data not used in a pooled by-class result.
Quality Assessment of the Study-Level Meta-Analyses. Methodologic quality is an important consideration in synthesizing evidence pooled by the meta-analyses. As outlined in the Methods chapter, the Overview Quality Assessment Questionnaire (Oxman and Guyatt, 1991; Oxman, Guyatt, Singer, et al., 1991) was used to appraise meta-analysis quality.* Descriptions of the ratings provide insight into their basis and potential implications. Although summaries are presented, they should not be interpreted reflecting the potential validity of conclusions from any meta-analysis. Rather, the quality ratings are but one element of the overall evidence evaluation and synthesis.
| Item | Rating | Lo et al., 2003 | Wang et al., 2004 | Arrich et al., 2005 | Modawal et al., 2005 | Bellamy et al., 2006 |
|---|---|---|---|---|---|---|
| 1. Were the search methods used to find evidence (original research) on the primary question(s) stated? | •Yes | •Clearly stated | •Clearly stated | •Clearly stated | •Clearly stated | •Clearly stated |
![]() | ||||||
| ○- No | ||||||
| 2. Was the search for evidence reasonably comprehensive? | ○Did not include EMBASE, but did search Cochrane Registry | ○English language only; did search Cochrane Registry | •Searched 4 electronic databases; Cochrane Registry; limited to English and German | ○Restricted to English, did not include EMBASE, but did search Cochrane Registry | •Comprehensive, no language restrictions; included multiple databases; hand searching | |
| 3. Were the criteria used for deciding which studies to include in the overview reported? | •Clearly stated | •Clearly stated |
![]() | •Defining populations, intervention, principal outcomes, and trial design specified | •Defining population, intervention, principal outcomes, and trial design specified | |
| 4. Was bias in the selection of studies avoided? | ○Due to lack of EMBASE search—i.e. no on Q2 | ○Language and lack of unpublished literature—no on Q2. |
![]() | ○English language restriction | •Clearly stated | |
| 5. Were the criteria used for assessing the validity of the included studies reported? | •Applied stated criteria although minimal | •Used 28-point validated check list | •Employed stated criteria: reporting treatment allocation; blinding; intention-to-treat analysis | •Chalmers | •Jadad | |
| 6. Was the validity of all studies referred to in the text assessed using appropriate criteria (either in selecting studies for inclusion or in analyzing the studies that are cited)? | •Each trial rated | •Each trial rated | •Each trial rated | •Each trial rated | •Each trial rated | |
| 7. Were the methods used to combine the findings of the relevant studies (used to reach a conclusion) reported? | •Random-effects models | •Random-effects models when heterogeneity present | •Random-effects models | •Random-effects models | •When combined used fixed- and random-effects models | |
| 8. Were the findings of the relevant studies combined appropriately relative to the primary question the overview addresses? | •Random effects models accounting for heterogeneity | •Random effects models accounting for heterogeneity | •Random effects models accounting for heterogeneity | •Random effects models accounting for heterogeneity | •Random effects models accounting for heterogeneity | |
| 9. Were the conclusions made by the author(s) supported by the data and/or analysis reported in the overview? |
![]() |
![]() | •Generally cogent synthesis of results; well conducted meta-analysis | ○Due to no on Q2; incorrect Egger test interpretation |
![]() | |
| 10. How would you rate the scientific quality of the overview? | “Flaws”: | |||||
| 1 extensive | ||||||
| 2 | 3 | 3 | 5 | 3 | 6 | |
| 3 major | ||||||
| 4 | ||||||
| 5 minor | Due to Q2 | Due to Q2 | Due to Q3 and Q4 | Due to Q2, Q9 | Due to Q9 | |
| 6 | ||||||
| 7 minimal | ||||||
In summary, based on the methodologic appraisal and quality, these meta-analyses form a substantive body of evidence and basis from which to evaluate the efficacy of hyaluronans for OA of the knee.
| Lo et al., 2003 | Wang et al., 2004 | Arrich et al., 2005 | Modawal et al., 2005 | Bellamy et al., 2006 | |
|---|---|---|---|---|---|
| General inclusion criteria | Single- or double-blind IA placebo-controlled RCTs, at least 3 injections, <50% dropout, ≥2 months f/u | Single or double blind placebo controlled RCTs | Single or double blind placebo controlled RCTs | Double blind placebo controlled RCTs | Single or double blind placebo (also other comparator controlled RCTs not considered here) |
| Pain and function outcome(s) compared to placebo | Pain: Global knee or walking or WOMAC pain or Lequesne or during non-walking activities | Pain with and without activities | Pain at rest | Knee pain (VAS) during activity or rest | VAS pain rest, weight bearing; WOMAC pain, function Patient global assessment |
| Joint function | Pain during or after exercise | Lequesne Index‡ | |||
| Joint function | |||||
| Pain effect measure | SMD Pain | Sum of Pain Intensity | WMD Pain | WMD Pain | WMD Pain |
| Change | Differences | Difference at Follow-up | Change | Difference at Follow-up | |
| Other pooled effect measures | Sum of Functional Intensity Differences | SMD Joint Function | Difference at follow-up in WMD, SMD; RR | ||
| Multiple outcomes | |||||
| Time | “8 to 12 weeks” | All time points/area under the curve | 2–6, 10–14, 22–30 weeks | 1, 5–7, 8–12, 15–22 weeks | 1–4, 5–13, 14–26, 45–52 weeks |
| Model selection | random effects | random & fixed effects | random effects | random effects | random & fixed effects |
Trial quality assessment | Intention-to-treat analysis/dropout rate | 28-point checklist (Downs and Black 1998) | Allocation concealment; intention-to-treat analysis; Binding | Chalmers | Jadad |
Comment on trial quality | 7/22 intention-to-treat data available | Mean score 17 (9–25) (maximum possible 28) | Trial quality considered “unsatisfactory” | Mean .70/1 (.44–.80) | Mean 3.8/5 (2–5) |
| Mean dropout 12.4% (0–40.3) | |||||
Heterogeneity | |||||
Test used | Cochran's Q | Cochran's Q (only non-cross linked) | Cochran's Q | Cochran's Q | I2 |
| I2 | Galbraith Plot | ||||
Result(s) | p<.001 | Multiple values reported, all significant except for ASFID% | Pain at rest I2 94% | Heterogeneity evident in plot; | I2 varied according to outcome; for pain and function generally 70–80% |
| Pain after or during exercise I2 81% | Q (p<.001) at time points examined | ||||
| Joint function I2 66% | |||||
| Meta-regression | |||||
Factors explored | — | Only for non-cross-linked: quality, publication year, molecular weight, mean age, trial duration, sample size | Allocation concealment Blinded outcome assessment intention-to-treat analysis | Pain type, medication (HA vs. hyaluronan G-F20), trial quality, week | — |
| Sensitivity analysis | Yes | Yes | Yes | Yes | No |
| Funnel plot/bias | Funnel Plot (asymmetric) | Funnel Plots (symmetric) | Regression methods | Egger Test (p=.096) | Not Performed |
| Egger Test (p=.07) | Egger Test; “could not detect” | ||||
| Included studies | 22 RCTs | 20 RCTs | 22 RCTs | 9 RCTs | 32/76 RCTs§ |
| Industry sponsored | 77% | 65% | not reported | 73% | 30%§ |
I A measure of overall variability ranging from 0% to 100%
Bellamy examined other outcomes not a part of this report's protocol
Based on notes reported for RCTs
ASFID: adjusted sum of function index differences; f/u: followup; HA: hyaluronic acid; IA: intra-articular; RR: relative risk; SMD: standardized mean difference (standardized effect size); VAS: visual analog scale; WMD: weighted mean difference; WOMAC: Western Ontario and McMaster Osteoarthritis Index
The treatment of time relative to the potential longitudinal nature of effects also differed among the study-level meta-analyses. Lo, LaValley, McAlindon, et al. (2003) examined effect at the time of likely maximum benefit (2 to 3 months post-injection) (Kirwan, 2001); Wang, Chen, Huang, et al. (2004) possible benefit over entire studies (discussed in detail later); Arrich, Piribauer, Mad, et al. (2005), Modawal, Ferrer, Choi, et al. (2005), and Bellamy, Campbell, Robinson, et al., (2006) pooled effects for various periods following administration. Pooling of functional differences, when reported, differed similarly.
Model selection was dictated by the degree of heterogeneity—random-effects models were generally used. Meta-regressions were performed in three meta-analyses (Wang, Chen, Huang, et al., 2004; Arrich, Piribauer, Mad, et al., 2005; Modawal, Ferrer, Choi, et al., 2005) exploring a variety of factors with study quality examined in each. Two of the five study-level meta-analyses reported funnel plot asymmetry (Lo, LaValley, McAlindon, et al., 2003; Modawal, Ferrer, Choi, et al., 2005), two did not (Wang, Chen, Huang, et al.; 2004; Arrich, Piribauer, Mad, et al., 2005), and Bellamy, Campbell, Robinson, et al. (2006) did not report those results (funnel plot asymmetry is later examined in supplementary analyses).
Summary. The approaches and characteristics of the five study-level meta-analyses provide different perspectives of the evidence. Supplementing results by relevant elements of included RCTs, the meta-analyses permit broad synthesis of the evidence.
Individual Meta-Analyses. Lo, LaValley, McAlindon, et al., 2003. Only pain outcome measures were pooled in this meta-analysis. MEDLINE® and Cochrane Controlled Trials Registry were searched from 1966 through February 2003, supplemented by hand searches of trial bibliographies and abstracts relevant scientific meetings. Randomized single- or double-blinded, placebo-controlled trials published in English and non-English languages were eligible for inclusion. RCTs were included if at least 3 intra-articular hyaluronan injections were administered, an intra-articular placebo was used, drop-out rate was less than 50 percent, and pain was reported using at least one of following instruments (in order of decreasing precedence):
Global knee pain score (VAS or Likert scale)
Knee pain on walking (VAS or Likert scale)
WOMAC Index
Lequesne Index
Knee pain during activities other than walking (VAS or Likert scale).
From 57 RCTs identified results from 22 were pooled. Because different outcome measures were combined, standardized mean differences in change* were pooled—the mean difference in pain change from baseline between treated and placebo groups divided by the pooled standard deviation. If pain was reported between 2 and 3 months following initial treatment that measure was included. Otherwise, pain measures were obtained from assessments occurring between 1 to 2 and 3 to 4 months.
Trial quality was characterized by reporting of an intention-to-treat analysis and drop-out rates. An intention-to-treat analysis was defined as “(1) it was characterized by its investigators as such and there was an attempt to analyze data from all randomized participants, or (2) there was no dropout (even if the analysis was not specifically described as intent-to-treat).” When intention-to-treat data were not published the authors attempted to obtain it.
| Time | Week “8–12” |
|---|---|
| Standardized Mean Difference (Change) | -0.32 |
| 95% CI | -0.47 to -0.17 |
| Heterogeneity (Cochran Q) | p<.001 |
| Trials Included | 22 |
CI: confidence interval
When the three RCTs of hylan G-F 20 were excluded, the pooled standardized mean difference diminished to -0.19 (95 percent confidence interval (CI): -0.27 to -0.10) with no evidence of heterogeneity (Cochran Q p=.58). The authors judged two of these three RCTs outliers (Scale, Wobig, and Wolpert, 1994; Wobig, Dickhut, Maier, et al., 1998). With the possible exception of hylan G-F 20, there was no indication of an association between product molecular weight and effect magnitude.
The pooled effect estimate from unpublished RCTs (-0.07; 95 percent CI: -0.28 to 0.15) and significant the Egger Test (p=.07) were interpreted as supporting publication bias. Nine of the RCTs were judged to have attempted an intention-to-treat analysis and three other analyses viewed as intention-to-treat owing to complete follow-up. Dropout rates in the pooled studies ranged from 0 to 40.3 percent.
Wang, Chen, Huang, et al., 2004. Pain (with or without activities) and functional outcome measures reported by VAS, WOMAC scores, Lequesne Index, or MODEMS (Musculoskeletal Outcomes Data Evaluation and Management Scale), and adverse events were pooled. MEDLINE®, EMBASE, Cochrane Controlled Trials Registry, and EMBASE were searched from 1966 to December 2001 for randomized single- or double-blinded, placebo-controlled trials. Hand searching was performed of relevant publications and bibliographies reviewed. Unpublished literature was not searched. Only English-language RCTs were considered. Reported outcome measures for pain or function were required. From 665 identified articles, results from 20 were pooled. Trial quality was appraised using a 28-point checklist developed by Downs and Black (1998).
A single outcome estimated over each trial's duration was pooled. The measure was intended to assess efficacy with respect to pain and functional outcomes—“efficacy scores.” The scores were obtained for pain and functional scales by:
Calculating the average difference between each consecutive time point
Dividing the average difference by the time between the those time points
Repeating the calculation for all consecutive time points and summing results.
The method estimates the area under the “pain intensity difference-versus-time curve.” Finally, the estimate is divided by the maximum scale of pain intensity multiplied by the trial duration and expressed as percentage—the SPID% or SFID% (sum of pain or functional intensity differences as a percentage). Two related estimates were also calculated and pooled as:
Averages: ASPID% and AFID% (sum of pain or functional intensity differences divided by the baseline intensity multiplied by trial duration)
Peak differences: Peak PID% and Peak FID% (maximum pain or functional intensity differences divided the maximum of the scale).
| Pooled Measure* | Pain with Activities | Function (Non-Cross-Linked) | ||||
|---|---|---|---|---|---|---|
| SPID% | ASPID% | Peak PID% | SFID% | ASFID% | Peak FID% | |
| Estimate | 7.9% | 13.4% | 9.9% | 5.3% | 11.7% | 8.2% |
| 95% CI | 4.1 to 11.7 | 5.5 to 21.3 | 4.8 to 15.0 | 2.1 to 8.5 | 6.3 to 16.2 | 3.8 to 12.6 |
| Heterogeneity† | 84% (I2) | 83% (I2) | 91% (I2) | p=.33 (Q) | p=.23 (Q) | p<.001 (Q) |
| Trials included | 17 | 15 | 16 | NR | NR | NR |
See text for definitions of Pooled Measures
Q reported only for functional measures. I calculated from data presented when possible.
(A)SFID: (adjusted) sum of function index differences; (A)SPID: (adjusted) sum of pain index differences; CI: confidence interval; FID; function index differences; PID: pain index differences;
Pooled estimates were higher for the 3 RCTs of hylan G-F 20 (Dickson and Hosie, 1998 [later published as Dickson, Hosie, and English, 2001]; Scale, Wobig, and Wolpert, 1994; Wobig, Dickhut, Maier, et al., 1998): SPID%, 23.6 percent; ASPID%, 34.8 percent; peak PID%, 27.1 percent; SFID% 21.9 percent; ASFID%, 38.3 percent; PEAK FID%, 26.8 percent (no confidence intervals accompanied estimates).
| Subgroup | Result | ||
|---|---|---|---|
| Blinding | Single | > | Double* |
| Centers | Single Center* | > | Multicenter |
| Intention-to-treat analyses | ITT Analyses* | ? | Per Protocol |
| Age | Mean Age ≤65* | > | Mean Age >65* |
| Disease stage | Less Advanced | > | Advanced |
| Effusion as inclusion criteria | Effusion | ? | No Effusion |
| Sample size | ≤100* | > | >100 |
| Escape analgesics allowed | Not Allowed | > | Allowed |
| Funding | Non Industry* | > | Industry |
Indicates significant Cochran Q for at least 2 of the 3 outcome measures—i.e., heterogeneity in pooled result
indicates effect larger in subgroup; ? inconsistent for the 3 outcome measures
Significant associations with trial results were found in meta-regressions for: (1) mean patient age for ASPID% without activities only; (2) publication year for SPID% functioning; and (3) trial quality, mean patient age, and sample size for ASFID% functioning. No association between molecular weight and outcome measures was found. Of the 54 regression coefficients tested, five were statistically significant.
Funnel plots using sample size for the ordinate (vertical axis) were not consistent with publication bias. The authors commented indirectly on the overall methodologic quality of the primary literature stating that allocation concealment was unclear in all RCTs and more high quality trials are needed. The mean quality score on the rating system used was 19 points (maximum 28) (Downs and Black, 1998, Pendleton, Arden, Dougados, et al., 2000).
Major adverse events were documented in three of 1002 knees treated with non G-F 20 hyaluronans (severe swelling, vasculitis, and a hypersensitivity reaction); one patient from 139 knees treated with hylan G-F 20 experienced an acute painful local reaction. The pooled relative risk of minor adverse events for all hyaluronan products was 1.2 (95 percent CI: 1.01 to 1.41).
Arrich, Piribauer, Mad, et al. (2005). Outcomes examined in this meta-analysis included pain at rest and during or after activities (VAS), joint function (WOMAC, Lequesne Index, subjective VAS rating, time for 40-meter walk), and adverse events. MEDLINE®, EMBASE, CINAHL, BIOSIS, and the Cochrane Controlled Trials Registry were searched from inception through April, 2004 for randomized single- or double-blinded, placebo-controlled trials published with English or German abstracts. Either pain at rest, during or after movement, joint function, or adverse event reporting was required. From 1,159 articles identified 22 were included—data from 17 trials reporting pain and/or joint function outcome measures were pooled; for adverse events outcomes from the 5 additional trials were included.
Outcome measures were pooled separately for four time periods: weeks 2 to 6, 10 to 14, 22 to 30, and 44 to 60. VAS pain was pooled as a weighted mean difference for each period. Different functional outcome measurement scales reported required pooling standardized effect sizes. Comparative adverse event risk was pooled as a relative risk. Trial quality was characterized by adequacy of allocation concealment, use of intention-to-treat analyses, and blinding.
| Rest | During/After Exercise | |||
|---|---|---|---|---|
| Weeks | 2–6 | 2–6 | 10–14 | 22–30 |
| Weighted mean difference VAS (100mm) | -8.7 mm | -3.8 mm | -4.3 mm | -7.3 mm |
| 95% CI | -17.2 to -0.2 | -9.1 to 1.4 | -7.6 to -0.9 | -11.8 to -2.4 |
| Heterogeneity (I2) | 94% | 81% | 0% | 0% |
| Trials included | 9 | 9 | 5 | 4 |
When rest pain measures were pooled from trials not using intention-to-treat analyses or when allocation concealment absent or unclear, the weighted mean difference was 15.6 mm lower (i.e., greater effect magnitude favoring hyaluronans); in unblinded trials the weighted mean difference was 13.6 mm lower (favoring hyaluronans). The large value of I2 for activity pain at 2 to 6 weeks was attributed to Henderson, Smith, Pegley, et al. (1994) in which pain increased among those with more advanced disease receiving hyaluronans. Excluding the trial diminished I2 to 20 percent while yielding a similar pooled weighted mean difference (-4.2 mm, 95 percent CI: -7.5 to -0.8). The authors noted that trial quality did not influence the pooled estimates for pain during or after exercise, but only a single trial was judged high quality.
| Joint Function | |||
|---|---|---|---|
| Weeks | 2–6 | 10–14 | 22–30 |
| Standardized mean difference | 0.0 | -0.11 | -0.16 |
| 95% CI | -0.23 to 0.23 | -0.31 to 0.09 | -0.16 to 0.13 |
| Heterogeneity (I2) | 66% | 59% | 62% |
| Trials included | 9 | 7 | 5 |
Sensitivity analyses were performed for all pooled outcomes at weeks 2 to 6 and 10 to 14 were including only RCTs reporting adequate allocation concealment, blinded outcome assessment, and intention-to-treat analyses. According to the report, “[N]o significant effect in favour of the intervention” was found. There was no association between molecular weight and effect size in meta-regressions. Adverse events, typically minor, were more common with hyaluronans than with placebo (pooled relative risk 1.08; 95 percent CI; 1.01 to 1.15). No evidence of publication bias was reported using regression methods, except possibly for the studies reporting adverse events (publication of trials reporting adverse events was more frequent).
Modawal, Ferrer, Choi, et al., 2005. The meta-analysis pooled only pain outcome measures reported on a VAS scale. MEDLINE®, and the Cochrane Controlled Trials Registry were searched from 1965 to August, 2004 for randomized double-blind, placebo-controlled English-language RCTs. Reference lists of included articles and reviews were also searched. From 1,872 articles identified 9 were included. Studies reporting pain as part of the WOMAC were excluded. Pain measures during activity or at rest were extracted and pooled (although which studies and at what time periods contributed activity or rest pain measures was not specified).
The mean difference between treatment and placebo in change from baseline pain was pooled for four time periods: weeks 1, 5 to 7, 8 to 12, and 15 to 22. Adverse event rates were not summarized. Trial quality was assessed using the method of Chalmers, Smith, Blackburn, et al. (1981) (maximum score of 1.0)—those scoring 0.75 or lower were considered low quality.
| Pain with activity or rest | ||||
|---|---|---|---|---|
| Weeks | 1 | 5–7 | 8–12 | 15–22 |
| Weighted mean difference VAS change (100mm) | -4.4 mm | -17.6 mm | -18.1 mm | -4.4 mm |
| 95% CI | -7.2 to -1.1 | -28.0 to -7.5 | -29.9 to -6.3 | -24.1 to 15.3 |
| Heterogeneity (I2*) | 92% | 92% | 95% | 94% |
| Trials Included | 9 | 6 | 6 | 3 |
I calculated from Q and accompanying df (degrees of freedom).
| Pain with activity or rest | ||||
|---|---|---|---|---|
| Weeks | 1 | 5–7 | 8–12 | 15–22 |
| Weighted Mean Difference VAS Change (100mm) | 1.0 mm | -7.2 mm | -7.1 mm | -4.4 mm |
| 95% CI | -1.2 to 3.2 | -12.0 to -2.4 | -11.3 to -3.0 | -24.1 to 15.3 |
| Heterogeneity (I2*) | 83% | 0 | 9% | 94% |
| Trials Included | 7 | 2 | 6 | 3 |
I calculated from Q and accompanying df (degrees of freedom).
In meta-regressions, trial quality and hylan G-F 20 were associated with significantly better outcomes at 5 to 7 and 8 to 12 weeks; poor trial quality was associated better outcomes at other time periods although statistically significant only at week 1. Potential publication bias was assessed using Egger test (p=.096) (time period not specified) which the authors stated was “not statistically significant...suggesting that there is no publication bias.”
Bellamy, Campbell, Robinson, et al., 2006. *Outcomes examined relevant to our protocol included pain at rest and with activity, WOMAC function, Lequesne Index, patient global assessment, and adverse events. The literature search included MEDLINE® (to the first week of January 2006); EMBASE, PREMEDLINE, and Current Contents to July 2003; the Cochrane Central Register of Controlled Trials; specialized journals and reference lists of identified randomized controlled trials; and pertinent review articles to December 2005. Single- or double-blinded randomized controlled trials with placebo or other comparators were eligible; no language restrictions were imposed. From 76 trials identified, 32 in the meta-analysis were placebo-controlled comparisons. Outcome measures from 30 RCTs were pooled in some manner. Trial quality was assessed using the Jadad scale (Jadad, 1996).
Outcome measures were pooled separately for four time periods: weeks 1 to 4, 5 to 13, 14 to 26, and 45 to 52. Unadjusted post-test scores were pooled (Bellamy, Campbell, Robinson, et al., 2006; page 5)—the difference between treatment and placebo at follow-up. VAS pain and Lequesne Index scores were pooled as weighted mean differences; WOMAC pain and function as standardized mean differences; patient global assessment and adverse events as relative risks.
Both by-product and by-class results were reported. While Bellamy, Campbell, Robinson, et al. (2006) emphasize the by-product results, we focus on by-class results for both clinical and methodologic reasons. Rationale for by-product results is based on the premise that “...these products differ in their MW [molecular weight], concentration, treatment schedules, and mode of production...” However, with the exception of hylan G-F 20, none of the preceding meta-analyses found outcomes differing by molecular weight. Thus, there is potential for spurious subgroup findings with multiple individual product analyses. Of the more than 850 forest plots presented, only 38 combine results from more than 3 trials. Accordingly, we focus on by-class results.
| Rest | Weight-Bearing | ||||
|---|---|---|---|---|---|
| Weeks | 1–4 | 1–4 | 5–13 | 14–26 | 45–52 |
| Weighted mean difference VAS (100mm) | -3.5 mm | -7.7 mm | -13.0 mm | -9.0 mm | -2.6 mm |
| 95% CI | -9.2 to 2.1 | -11.3 to -4.1 | -17.8 to -8.2 | -14.8 to -3.2 | -7.4 to 2.2 |
| Heterogeneity (I2) | 80% | 80% | 82% | 77% | 0% |
| Trials included | 9 | 20 | 16 | 8 | 3 |
The magnitude of pooled effect estimate was greatest at 5 to 13 weeks and lower thereafter—the critical caveat being that trials and outcome measures from different patients were pooled at different periods. The degree of heterogeneity among trials was large at all periods except weeks 45 to 52 where only 3 trials were included.
| WOMAC Pain | |||
|---|---|---|---|
| Weeks | 1–4 | 5–13 | 14–26 |
| Standardized mean difference | -1.2 | -1.0 | -1.0 |
| 95% CI | -1.9 to -0.5 | -1.6 to -0.5 | -1.8 to -0.3 |
| Heterogeneity (I2) | 88% | 88% | 80% |
| Trials included | 6 | 6 | 3 |
Pooled standardized mean differences were lower than -1.0 during each period and magnitudes appeared similar over time. Heterogeneity among trials was large (I2 values 80 to 88 percent).
| WOMAC Physical Function | |||
|---|---|---|---|
| Weeks | 1–4 | 5–13 | 14–26 |
| Standardized mean difference | -1.0 | -0.9 | -0.8 |
| 95% CI | -1.6 to -0.4 | -1.3 to -0.4 | -1.4 to -0.2 |
| Heterogeneity (I2) | 85% | 84% | 70% |
| Trials included | 6 | 6 | 3 |
| Lequesne Index | ||||
|---|---|---|---|---|
| Weeks | 1–4 | 5–13 | 14–26 | 45–52 |
| Weighted Mean Difference | -0.8 | -1.4 | -0.1 | -1.1 |
| 95% CI | -1.5 to -0.2 | -2.0 to -0.7 | -0.8 to 0.9 | -2.7 to 0.5 |
| Heterogeneity (I2) | 44% | 16% | 6% | NA |
| Trials Included | 5 | 4 | 3 | 1 |
There was less heterogeneity than for the WOMAC results. However, estimates at 1 to 4 and 5 to 13 weeks included results from 40 patients twice in the trial finding the largest benefit (Carrabba, Paresce, Angelini et al., 1995).
| Patient Global Assessment | ||||
|---|---|---|---|---|
| Weeks | 1–4 | 5–13 | 14–26 | 45–52 |
| Relative risk of improvement | 1.1 | 1.1 | 1.0 | 1.0 |
| 95% CI | 0.9 to 1.4 | 0.9 to 1.4 | 0.7 to 1.5 | 0.8 to 1.2 |
| Heterogeneity (I2) | 58% | 60% | 70% | 30% |
| Trials included | 5 | 6 | 4 | 2 |
Although lower than in previous results, heterogeneity was still generally high. There was no evidence that patient-reported global improvement differed with treatment during any time period—all relative risks were indistinguishable from unity
| Trial | Weeks | NNT |
|---|---|---|
| Number of Patients Improved | ||
| Lohmander et al., 1996 | 1–4 | 100 |
| 5–13 | Infinity | |
| 14–26 | 7.1 | |
| Shichikawa et al., 1983a (5-week trial) | 1–4 | 5 |
| Shichikawa et al., 1983b (5-week trial) | 1–4 | 11 |
| Puhl et al., 1993 | 5–13 | 10 |
| Brandt et al., 2001 | 14–26 | 20 |
| Number of Patient Clinical Failures | ||
| Karlsson et al., 2002 | 14–26 | 11 |
| 45–52 | 6.7 | |
| WOMAC Pain 40% Relative; 5-point Absolute (20-point scale) | ||
| Altman et al., 2004 | 1–4 | 14 |
| 5–13 | -33* | |
| 14–26 | -33* | |
| WOMAC Pain >5-point Improvement (20-point scale) | ||
| Brandt et al., 2001 >5-Point | 14–26 | 5.9 |
| Patient Global Assessment (Number Improved) | ||
| Corrado et al., 1995 | 1–4 | -2.3 |
| Creamer et al., 1994 | 1–4 | 11.1 |
| Sala et al., 1995 | 1–4 | -6.7 |
| Corrado et al., 1995 | 5–13 | -10 |
| Sala et al., 1995 | 5–13 | -2.9 |
| Henderson et al., 1994 | 14–26 | 25 |
| Huskisson et al., 1999 | 14–26 | -3.1 |
Sign incorrectly reported in Bellamy, Campbell, Robinson, et al. (2006, page 194; 2007, page 194)
The systematic review did not directly examine any potential relationship between product molecular weight and efficacy. However, results from studies of hylan G-F 20 were separately analyzed. At 5 to 13 weeks, the pooled weighted mean difference in VAS measured pain from four trials was -22.5 mm (95 percent CI: -35.2 to -9.7; I2 = 82.9%). One trial included in the estimate was not strictly a placebo comparison (Wobig, Dickhut, Maier, et al., 1998).
Potential publication bias was not analyzed although discussed: “In an attempt to address potential publication bias, we have searched abstract books, as well as published manuscripts, corresponded with manufacturers, and contacted investigators in the search for additional information or unpublished studies” (Bellamy, Campbell, Robinson, et al., 2006; page 46). Sensitivity analyses or meta-regressions exploring heterogeneity of pooled estimates were not reported. Mean trial quality on the Jadad scale was 3.7 (range 2 to 5).
The pooled relative risk of local reactions for hylan G-F 20 (5 trials) was 1.9 (95 percent CI: 0.51 to 7.3, 5 trials) and other hyaluronans 1.6 (95% CI: 0.54 to 5.6, 5 trials). Adverse events were otherwise reported primarily as relative risks from individual trials.
Strand, Conaghan, Lohmander, et al., 2006. Strand, Conaghan, Lohmander, et al. (2006) conducted a patient-level meta-analysis for a single outcome—the Lequesne Index. Patient data (N=1,155) were obtained from five double-blind placebo-controlled randomized controlled trials included in a premarketing approval application for Supartz® (18 trials were included in the application). The five trials were conducted in Germany, Sweden, U.K., France, and Australia. Three have been published (Day, Brooks, Conaghan, et al., 2004; Puhl, Bernau, Greiling, et al., 1993; Lohmander, Dalen, Englund, et al., 1996).
Participants received three to five weekly intra-articular hyaluronan or placebo injections and were followed at least 3 months. They were assessed at weeks 5 and 13 in all trials, week 9 in four, and weeks 17, 20, and/or 25 in three trials. Four trials included individuals aged 40 years and older; the other aged 50 years and older (Lohmander, Dalen, Englund, et al., 1996). Lequesne Index score was the primary outcome in three RCTs. Intention-to-treat analyses were used and missing data imputed by carrying the last observation forward. Both fixed- and random-effects models were examined. Trial quality was assessed by Jadad scale.
Analyses included 1,155 participants (619 treated, 536 placebo). Dropout rates were 10.2 and 14.6 percent in treated and placebo arms respectively. The highest drop out rates occurred in the unpublished U.K. trial—28.3 and 40.9 percent in hyaluronan and placebo groups. No significant baseline differences were noted within the overall sample.
Longitudinal mixed-effects models (random effects) were fitted to the data with some differences between the fixed- and random-effects models. In both, a significant treatment effect was seen; the treatment by time interaction was not significant in the fixed-effects model and reached p=.06 in the random effects one.
In a fixed-effects model the mean improvement in Lequesne Index was -2.74 and -2.16 in the placebo group (difference of -0.58, 95 percent CI: -0.95 to -0.20); in a random-effects model -2.68 and -2.00 (difference of -0.68, 95 percent CI: -0.79 to -0.56). When analyses were conducted for individual trials, treatment effects were statistically significant in two. Results were sensitive to model specification in two trials. For one, (Puhl, Bernau, Greiling, et al., 1993) the fitted mixed-effects model showed no treatment difference (p=.55), while the original publication reported a statistically significant difference in Lequesne Index scores at the end of follow-up (p=.005 at 14 weeks). No participant-level random-effects models were examined.
Adverse events were noted in 1.8 and 3.2 percent of the hyaluronan and placebo groups.
Altman, Akermark, Beaulieu, et al., 2004. The trial randomized 347 participants in a placebo-controlled double-blind 26-week multicenter trial across 18 sites in the United States, Canada, and Sweden. Treatment and placebo groups were comparable at baseline. Mean participant age was approximately 63 years; 55 percent were female; and 35 percent had prior knee surgery; knees with Kellgren-Lawrence radiographic grades 2 to 4 were enrolled. A single NASHA (60 mg) or saline placebo injection was administered to 172 or 174 participants, respectively. The primary outcome was response defined as a reduction in WOMAC pain score (20-point scale) ≥40 percent with an absolute 5-point improvement. Following the baseline exam, participants were assessed at weeks 2, 6, 13, and 26.
Trial quality was rated “good.” There were no differences in response rates between treatment and placebo arms at any of the time points examined in either intention-to-treat or per protocol analyses. In a post-hoc analysis of the subgroup with only knee OA (62 percent), a significant difference was found at week 6 (42.1 versus 27.5 percent) but at no other time point.
This trial used clearly defined responder criteria (Dougados, Nguyen, Listrat, et al., 2000) and found no evidence for a beneficial effect of NASHA. The post-hoc subgroup finding of a single difference was inconsistent with the overall result.
Neustadt, Caldwell, Burnette, et al., 2005. At 24 sites in the United States and Canada, 372 participants were randomized in a placebo-controlled, double-blind, 22-week trial. Treatment and placebo groups were comparable at baseline. The mean age of participants was 60 years; 52 percent were female; those with Kellgren-Lawrence radiographic grades 2 to 3 were enrolled. The trial had three arms with four weekly intra-articular injections: (1) four hyaluronan injections, (2) three hyaluronan injections followed by arthrocentesis, and (3) four arthrocenteses. The primary outcome was response defined as a 20 percent relative and a 50-mm absolute improvement on WOMAC pain at weeks 8, 12, 16, and 22. Baseline characteristics of the intention-to-treat sample were not reported, only those of the “evaluable population.” This subgroup was defined as participants receiving all four injections, attending at least one follow-up visit, and without protocol deviation (n=336 or 90 percent of those randomized). Intention-to-treat analyses were not reported.
Trial quality was rated “fair.” In the “evaluable population,” there were no statistically significant differences in WOMAC pain at any time point. Greater improvement in patient global assessment was evident at weeks 8 through 16 in the four hyaluronan injection group compared to the other two groups. No difference was evident between the arthrocentesis and three hyaluronan injection arms. The primary responder outcome was not reported for the “evaluable population.”
An “evaluable subgroup” with Kellgren-Lawrence grade 2 or 3 and contralateral knee WOMAC pain <150 mm (500 mm scale) was next analyzed (n=294, 79 percent of those randomized). When response was defined as a 20 percent improvement alone (not the primary specified outcome measure) the 4 hyaluronan injection group was superior to placebo at week 8 (76 versus 62 percent, p=0.035), but at no other time point. The three hyaluronan injection group was not superior to placebo. Further post-hoc subgroup analyses examined 40 and 50 percent improvement response criteria finding higher response 40 percent response rates with four hyaluronan injections compared to placebo at all time points.
The trial did not demonstrate benefit for the primary efficacy outcome and intention-to-treat analyses were not reported. A single statistically significant responder result was found examining two subgroups. Subgroups were apparently defined post-hoc and not analyzed according to the primary efficacy outcome.*
Primarily unilateral OA of the knee
Outerbridge grades I through III by arthroscopy performed more than 6 months before entry
Pain ≥40 mm with walking, climbing or descending stairs, or weight bearing.
Mean participant age was approximately 54 years; 40 percent were female; 39 percent had prior partial meniscectomies and 7 prior knee surgery; 43 percent of knees were classified Ahlback grade 0 and 64 percent grade 0 or 1. The trial included three arms: hylan G-F 20, 25 mg hyaluronan, or placebo (buffered saline) each administered once weekly for three weeks. Baseline characteristics in the three arms were comparable; two participants were non-Caucasian. Following the initial examination, participants were assessed at weeks 6, 12, 18, 26, 38, and 52. The primary efficacy outcome was VAS pain during walking, stair climbing, or weight-bearing with the previous assessment provided to the subject. Response was defined being symptom free (VAS ≤20 mm) at week 26. Among secondary outcomes were Lequesne Index and patient assessment of overall response. Intention-to-treat analyses were performed without adjustments for multiple comparisons.
This trial enrolled a young predominantly male sample with a goal to “halt the progression of early-stage chondral pathology to end-stage OA disease.” At 26 weeks, response to hylan G-F 20 was significantly better than placebo, but there were few significant results among the many examined and no adjustment for multiple comparisons.
The meta-analyses examining adverse events described small relative increased risk. Wang, Chen, Huang, et al. (2004) reported a pooled relative risk for minor events of 1.2 (95 percent CI: 1.01 to 1.41) and Arrich, Piribauer, Mad, et al. (2005) 1.08 (95 percent CI; 1.01 to 1.15). Bellamy, Campbell, Robinson, et al. (2006) estimated a pooled relative risk for local reactions accompanying hylan G-F 20 (five RCTs) of 1.9 (95 percent CI: 0.51 to 7.3, five RCTs) and other hyaluronans (5 RCTs) of 1.6 (95 percent CI: 0.54 to 5.6).
Six articles or abstracts were identified addressing adverse event occurrence. Hamburger, Lakhanpal, Mooar, et al. (2003) reviewed hyaluronan product safety profiles from a MEDLINE® search through July 2002 and the FDA Manufacturer and Device Experience Database (MAUDE).† The review noted rare occurrence of serious reactions to both Hyalgan® and hylan G-F 20.
Waddell (2003) described adverse event rate accompanying hylan G-F 20 from a retrospective review in a single clinical practice. He reported a local adverse event rate of 2.1 percent (82/3,931) per injection—1 percent (34/3,367) for those receiving a single course and 8.5 percent (48/564) accompanying a second course.
Maheu and Bonvarlet (2003) surveyed French rheumatologists to explore the occurrence of acute pseudoseptic arthritis post-hyaluronan injection—a severe hyaluronan-related adverse event reportedly uncommon. A questionnaire was sent to 81 rheumatologists of whom 26 responded. Sixteen reported 33 cases or pseudoseptic arthritis, possibly more frequently associated hylan G-F 20. The authors concluded acute pseudoseptic arthritis is “not so rare.” Limitations of the survey included the absence of a denominator to quantify risk and the low survey response rate.‡
Kemper, Gebhardt, Meng, et al. (2005) reported a 5.3 percent adverse event rate accompanying hylan G-F 20 injections in 4,253 patients. Arthropathy was most common occurring in 3.1 percent of patients. The most severe event reported was a large effusion and synovitis in one patient. Those with previous hyaluronan treatments had a two-fold increased risk of adverse events. Lussier, Cividino, McFarlane, et al. (1996) reported adverse events among 336 patients receiving 1,537 injections of hylan G-F 20. Local adverse events occurred at a rate of 2.7 percent per injection and in 1 of 12 patients.
Finally, a search of MAUDE for hyaluronan products (code MOZ) from January 1, 2005 through January 1, 2007 identified 236 records reporting adverse events following knee injection. Nine reports mentioned pseudosepsis or pseudoseptic reaction—four associated with Synvisc® (hylan G-F 20), one with Euflexxa©, and four with Hyalgan®. In 85 adverse events patients were hospitalized.
Generally, severe adverse events associated with hyaluronan-based products have been reported as uncommon in trials. In contrast, local minor adverse events appear common, although the risk appears not substantially different compared to placebo injection. The true risk of pseudoseptic reactions may be small, but one study suggests they could be more common than generally thought.
We performed supplementary analyses to address three key issues:
Heterogeneity—clinical and statistical
Publication bias
Hylan G-F 20.
The majority of these analyses rely upon data abstracted by Bellamy, Campbell, Robinson, et al. (2006) which included the largest number of trials. However, trial quality ratings we performed and cited throughout this reported were used for all analyses.
Clinical and Statistical Heterogeneity/Sensitivity Analyses. All study-level meta-analyses found high heterogeneity and appropriately employed random effects models. Four of the five identified hylan G-F 20 and trial quality issues as factors affecting pooled estimates. Using post-test VAS pain as the outcome at 5–13 weeks (Bellamy, Campbell, Robinson et al. 2006, Comparison 50, 16 pooled studies), we performed sensitivity analyses exploring factors suggested by the meta-analyses and our own review of evidence:
Trial quality (good/fair versus poor)*
Hylan G-F 20 versus other hyaluronans
Sample size (≤100 or >100) or reported power calculations (these attributes were correlated; differences according to sample size was found to explain more heterogeneity)
Industry involvement
Use of rescue analgesia
Primary intention-to-treat analyses.*
| Random-Effects Model* | ||||
|---|---|---|---|---|
| Study or Sample Characteristic | WMD VAS 100 mm | 95% CI | I2 | |
| Study Quality | Good/Fair | -8.8 | -12.4 to -5.2 | 61.0% |
| Poor | -23.2 | -37.2 to -9.3 | 89.7% | |
| Hylan | G-F 20 | -20.8 | -31.3 to -10.4 | 83.8% |
| Others | -9.3 | -13.4 to -5.1 | 68.3% | |
| Sample Size | ≤ 100 | -17.0 | -20.8 to -13.2 | 26.3% |
| > 100 | -7.3 | -14.6 to 0.4 | 89.2% | |
| ITT | Yes | -12.8 | -18.8 to -6.8 | 84.6% |
| No | -13.5 | -22.1 to -4.9 | 80.2% | |
| Power Calculation | Yes | -9.1 | -16.5 to -1.8 | 86.5% |
| No | -16.2 | -22.7 to -9.8 | 78.5% | |
| Rescue Analgesia | Yes | -11.4 | -16.3 to -6.6 | 82.5% |
| No | -24.2 | -34.6 to -13.7 | 38.1% | |
| Industry Involvement | Yes | -12.9 | -18.5 to -7.3 | 85.4% |
| No | -13.7 | -18.4 to -9.0 | 0.0%* | |
A fixed-effects model.
Add P-values
Characteristics found to influence results next examined in a hierarchical Bayes linear model (DuMouchel, 1994) with a vague prior for τ2 †specified. Study quality and hylan G-F 20 were retained in the model based on these findings and conclusions from the meta-analyses. Of the remaining attributes, only sample size was found independent and statistically significant.‡ In the model including study quality, use of hylan G-F 20, and sample size all were statistically significant (respective probabilities of .006, .049, and .01) and between-study variability in the model (τ2) was reduced by 38 percent. In the model pooled weighted mean differences in VAS pain varied from -3.0 mm (good/fair study quality, non G-F 20 hyaluronan, sample size >100) to -29.6 mm (poor study quality, hylan G-F 20, sample size ≤100).
Although analyses must be considered exploratory, in subgroup analyses and meta-regressions results were sensitive to study characteristics and use of hylan G-F 20. Industry involvement had no effect on pooled estimates. While the use of rescue analgesia in subgroup analyses influenced results, it was not independent of study quality and use of hylan G-F 20 and only three trials did not allow rescue analgesia. Study quality, hylan G-F 20, and sample size were independently associated with the trial effects explaining a sizeable proportion of between-study variability.
Publication Bias. Three findings suggest the presence of publication bias:
Funnel plot asymmetry
Small trial bias
Unpublished trials.
Funnel Plot Asymmetry. Two meta-analyses found funnel plot asymmetry (Lo, LaValley, McAlindon, et al., 2003; Modawal, Ferrer, Choi, et al., 2005); using sample size as the ordinate Wang, Chen, Huang, et al., (2004) suggested no evidence of asymmetry. Arrich, Piribauer, Mad, et al. (2005) found no evidence of publication bias while Bellamy, Campbell, Robinson et al. (2006) did not report examining potential publication bias.
Funnel plots constructed with precision as the ordinate using data from Wang, Chen, Huang, et al. (2004) showed asymmetry for SPID% (p=0.038) and peak PID% (p=.015) although not for ASPID% (p=.56) which as an average measure could be anticipated.* In Bellamy, Campbell, Robinson et al. (2006), Egger tests calculated for pooled VAS pain at rest, 1 to 4 weeks, 5 to 13 weeks, and 14 to 26 weeks yielded p-values of .9, <.001, .017, and .086, respectively.† While other factors could explain these test results (Lau, Ioannidis, Terrin, et al., 2006) those reported in the meta-analyses and those we performed are consistent with publication bias.
Small Trial Bias. An apparent small trial bias was noted by Wang, Chen, Huang, et al. (2004) and shown in our sensitivity analyses. The average size of trials reporting sample size calculations was 204 compared to 60 for those without. The effect magnitude in clearly adequately powered trials was 44 percent lower than in those not reporting sample size calculations—consistent with concluding positive underpowered studies were more often published than negative ones.
Hylan G-F 20. The five study-level meta-analysis suggested hylan G-F 20 has greater effects than other hyaluronans. To extend results from the meta-analyses and explore how the potential effect of hylan G-F 20 might differ, we examined pooled trial results further.
Pooling. Eight trials of hylan G-F 20 assessed outcome measures at different time points using different instruments (Cubukcu, Ardic, Karabulut, et al., 2004; Dickson, Hosie, and English, 2001; Karlsson, Sjogren, and Lohmander, 2002; Kotevoglu, Iyibozkurt, Hiz, et al., 2006; Moreland, Arnold, Saway, et al., 1993; Rolf, Engstrom, Ohrvik, et al., 2005; Scale, Wobig, and Wolpert, 1994; Wobig, Dickhut, Maier, et al., 1998). For consistency and to allow comparison with other meta-analyses, we adopted the general approach taken by Bellamy, Campbell, Robinson, et al. (2006) pooling weighted mean differences between treatment and placebo arms at follow-up. Data extracted by Bellamy, Campbell, Robinson, et al. (2006) at 5 to 13 weeks post-injection (near the time of maximum anticipated benefit) were used.
Results from two trials could not be included in the pooled result. Follow-up in the Moreland, Arnold, Saway, et al. (1993) trial was limited to four weeks. Rolf, Engstrom, Ohrvik, et al. (2005) did not report a pain outcome measure amenable to pooling with the other trials. Five of the remaining six RCTs reported pain on a VAS scale (Dickson, Hosie, and English, [2001] as part of WOMAC 100-mm VAS). Cubukcu et al. (2006) assessed WOMAC pain on a 20 point scale (which we rescaled to 100 for pooling). From Karlsson, Sjogren, and Lohmander (2002) only the hylan G-F 20 and placebo arms were included. Random-effects models were fitted in all but one instance due to heterogeneity.
These results can be summarized as follows:
The pooled effect magnitude from the available hylan G-F 20 RCTs appears larger than for other hyaluronans.
Due to trial quality, drop-out rates, heterogeneity, considerably larger effects in the Wobig, Dickhut, Maier, et al. (1998) and Scale, Wobig, and Wolpert (1994), and between-trial variability, the pooled effect estimate must be considered accompanied by greater uncertainty than reflected in the confidence interval.
| Lo et al., 2003 | Wang et al., 2004 | Arrich et al., 2005 | Modawal et al., 2005 | Bellamy et al., 2006 | |
|---|---|---|---|---|---|
| Trials pooled at 8–12 weeks | 22 | 20 | 5 | 6 | 16 |
| Sample size: mean (range) * | 134 (24–108) | 117 (12–347) | 250 (49–408) | 181 (80–347) | 131 (24–407) |
| Total patients | 2,927 | 2,345 | 1,251 | 1086 | 2,090 |
| Pooled pain outcome cited† | Hierarchy‡ | With/without Activities | During or After Exercise | During Activity or Rest | Weight Bearing |
| Comparison/Effect Measure | Difference in Change | Differences | Difference | Difference in Change | Difference |
| (standardized) | (in pain intensity summed) | (at follow-up) | (unstandardized) | (at follow-up) | |
| (effect size) | (0–100%) | (mm VAS pain) | (mm VAS pain change) | (mm VAS pain) | |
Overall pooled effect | -0.32 | 7.90% | -4.3 mm | -18.1 mm change | -13.0 mm |
| 95% CI | (-0.47 to -0.17) | (4.1% to 11.7%) | (-7.6 to -0.9) | (-29.9 to -6.3) | (-18.0 to -7.9) |
| p Value | <.001 | NR | .013 | NR | <.001 |
| Sensitivity Analyses | |||||
Trial quality | |||||
| Good ± Fair) | NR | Reported NS in meta-regression § | -6.2 mm (-15.9 to 3.5) ** | -7.1 mm (-11.3 to 3.0) | -8.8 mm (-12.4 to -5.2 )†† |
| Poor | NR | NR | NR | -23.2 mm (-37.2 to -9.3)†† | |
Trial size | |||||
| Large | NR | 3.6% (0.9 to 6.3) | NR | NR | -7.3 mm (-14.6 to -7.7)†† |
| Small | NR | 6.0% (2.1 to 10.1) | NR | NR | -17.0 mm (-20.8 to -13.2)†† |
Molecular weight | |||||
| G-F 20 | NR | 23.6% (CI not reported) | Did not include any G-F 20 trials | -33.0 mm ( -50.5 to -17.5) ‡‡ | -20.8 mm (-31.3 to -10.4)†† |
| Non G-F 20 | -0.19 (-0.27 to -0.10) | 5.4% (2.6 to 19.9) | -19.2 mm (-30.5 to -7.9) | -9.3 mm (-13.4 to -5.1)†† | |
| Heterogeneity | |||||
| I2 | NR | NR | 0% | 95% | 83% |
| Other | Cochran Q: P<.001 | Cochran Q: p<.001* | Cochran Q: p<.001 | ||
| Explored/Explained | Yes/Yes† | Yes/No | NA/NA ‡ | Yes/Partially§ | No/No |
| Results consistent with publication bias | Yes | No** | No | Yes | Yes†† |
| (EPC analysis) | |||||
If not reported in the meta-analysis, figures calculated from original trial publications using patients randomized (not knees).
While Arrich et al. (2005) and Bellamy et al. (2006) pooled a similar effect measure, the other meta-analyses chose different approaches detailed in the Methods chapter.
Pain reported from one of the following instruments in order of decreasing preference: global knee pain score; knee pain on walking; WOMAC index; Lequesne Index; knee pain during activities other than walking.
Also reported that elements characterizing studies of lower methodologic quality were associated with higher effect estimates.
Result from a single high quality trial.
From supplementary EPC analyses; not reported in Bellamy et al. (2006).
Calculated from meta-regression model also including study quality and pain with activity or at rest, not presented in publication.
CI: confidence interval; NA: not applicable; NR: not reported; NS: not significant (p<.05); VAS: Visual Analog Scale.
Drawing conclusions requires considering the clinical meaning of pooled results, strengths and limitations of the meta-analysis and trial evidence, heterogeneity in pooled results, potential publication bias, and the uncertainty contributed by each.
Clinical Meaning. Important effects, regardless of statistical considerations, must be accompanied by a minimal clinically important improvement patients can identify. While the amount of improvement required may not be definitively established (Tubach, Ravaud, Baron et al., 2005; Pham, van der Heijde, Altman, et al., 2004), between 20 and 40 percent improvements have been used in recent hyaluronan trials (Nuestadt, Caldwell, Burnette, et al., 2005, Altman, Akermark, Beaulieu, et al., 2004). In this respect, pooled results from the meta-analyses are limited due to a primary literature not generally reporting results quantifying proportions responding or achieving likely minimal clinically important improvements for the various outcome measures. Few trials reported response rates and an insufficient number from which to draw conclusions or to combine.
Strengths of the Meta-Analyses. Lo, LaValley, McAlindon, et al. (2003) attempted to acquire intention-to-treat data even if not reported, conducted sensitivity analyses supporting their conclusions, and were able to explain between-trial variability by excluding two outlier results. Wang, Chen, Huang, et al., (2004) reported extensive subgroup results and meta-regressions. Arrich, Piribauer, Mad, et al. (2005) examined effects at different time periods and carefully explored between-trial variability. Bellamy, Campbell, Robinson et al. (2006) examined the greatest breadth of literature. Strand, Conaghan, Lohmander, et al. (2006) was able to examine patient-level data.
Key Limitations of Primary Literature. Trial quality was the fundamental limitation of the primary literature—noted in four of five study-level meta-analyses. The second key limitation was the lack of reported response rates from intention-to-treat samples. This limits applying results to individual patients.
Heterogeneity among trials results was high for pooled outcome measures in all study-level meta-analyses; use of hylan G-F 20 and trial quality were found to influencing pooled effect magnitude and heterogeneity. Supplementary analyses suggested trial size also to account for some heterogeneity.
Potential Publication bias was consistent with Egger test results in three of the meta-analyses (Lo, LaValley, McAlindon, et al., 2003; Modawal, Ferrer, Choi, et al., 2005; Bellamy, Campbell, Robinson et al., 2006), and in Wang, Chen, Huang, et al., (2004), dependent on the choice of ordinate. Lo, LaValley, McAlindon, et al. (2003) also reported larger effect sizes in unpublished trials. Small trial size was associated with larger effects and less often accompanied by sample size calculations; a substantial number of patients were participants in unpublished trials. This evidence supports the presence of publication bias.
Four RCTs examined subgroups specified by our protocol including age, sex, primary/secondary OA of the knee, body mass index (BMI)/weight, and disease severity. None examined ethnicity, disease duration, or prior treatment. In one trial a subgroup comparison was preceded by stratified randomization. No other subgroup comparisons were prespecified—results obtained in post-hoc analyses.
Lohmander, Dalen, Englund, et al. (1996) noted the subgroup aged 60 to 75 years with Lequesne Index scores over 10 (worse disease severity) experienced greater reduction in VAS pain compared to placebo (-23 mm versus -7 mm respectively at 13 weeks). However, in a confirmatory trial (Karlsson, Sjogren, and Lohmander, 2002) no benefit was found for that subgroup. This was the only subgroup result tested in a confirmatory study.
| Mean Reduction Walking VAS Pain (mm) Compared to Placebo (and 95% CI; from figure) | ||
|---|---|---|
| Age | <65 | -12.0 (-20 to -4) |
| ≥65 | -5.5 (-16 to 6) | |
| Sex | Women | -17.0 (-17to 0) |
| Men | -16.0 (-22 to -2) | |
| BMI | ≤ 30.5 | -6.0 (-13 to 2) |
| > 30.5 | -16.0 (-25 to -7) | |
| Disease Severity | “Moderate” | -6.0 (-12.5 to 1.5) |
| “Severe” | -10.5 (-25 to 2.0) | |
| KL2 | -9.0 (-17 to -1) | |
| KL3 | -7.0 (-13 to 1) | |
Dahlberg, Lohmander, Ryd, et al. (1994) reported no beneficial effect of hyaluronan in the presence of previous trauma (secondary disease). Henderson, Smith, Pegley, et al. (1994) concluded that “hyaluronan offers no significant benefit over placebo during a five week treatment period...” but also reported effects among those classified as Kellgren-Lawrence grade 2 and grades 3–4—each with separate control groups. At 5 weeks, the VAS pain score in the Kellgren-Lawrence grade 2 hyaluronan arm improved -15.6 mm compared to -14.2 mm for placebo arm; in the Kellgren-Lawrence grade 3–4 hyaluronan arm -8.7 mm, compared to -18.0 mm for placebo. Finally, Petrella, DiSilvestro, and Hildebrand (2002) reported no significant differences within subgroups defined by age, sex, and BMI but estimates were not stated.
Comment. There is no evidence of differential effect of intra-articular hyaluronan according to subgroups defined by age, sex, primary/secondary OA of the knee, BMI/weight, or disease severity. However, the subgroup evidence is limited. The single positive subgroup finding subsequently examined in a confirmatory RCT was not substantiated.
The single study comparing the interventions of interest to this Evidence Report was conducted by Forster and Straw (2003). Forster and Straw (2003) randomized patients to arthroscopic lavage and debridement or intra-articular Hyalgan®. It should be noted that the Forster and Straw trial is the only study meeting selection criteria for this Evidence Report's Key Question 4, concerning the comparative short-term and long-term outcomes of viscosupplements, glucosamine and chondroitin, or arthroscopic lavage and debridement. The trial by Forster and Straw will be discussed separately, in Results, Part III, Key Question 4.
What are the Clinical Effectiveness and Harms of Intra-Articular Hyaluronic Acid/Hyaluron Preparations Injections in Patients With Primary OA of the Knee?
Results from 42 trials (N=5,843), all but one synthesized in various combinations in six meta-analyses, generally show positive effects of viscosupplementation on pain and function scores compared to placebo. However, the evidence on viscosupplementation is accompanied by considerable uncertainty due to variable trial quality, potential publication bias, and unclear clinical significance of the changes reported.
The pooled effects from poor-quality trials were as much as twice those obtained from higher-quality ones.
There is evidence consistent with potential publication bias. Pooled results from small trials (≤100 patients) showed effects up to twice those of larger trials consistent with selective publication of underpowered positive trials. Among trials of viscosupplementation, those that have not been published in full text comprise approximately 25 percent of the total patient population.
Interpreting the clinical significance of pooled mean effects from the meta-analyses is difficult; mean changes do not quantify proportions responding. Numbers needed to treat cannot be calculated from mean changes.
Trials of hylan G-F 20, the highest molecular weight cross-linked product, generally reported better results than other trials.
Minor adverse events accompanying intra-articular injections are common, but the relative risk accompanying hyaluronan injections over placebo appears to be small. Pseudoseptic reactions associated with hyaluronans appear relatively uncommon but can be severe.
What are the Clinical Effectiveness and Harms of the Interventions of Interest in Patients With Secondary OA of the Knee?
We identified no studies enrolling patients with only secondary disease, or that stratified randomization by primary and secondary disease. There is insufficient evidence to draw conclusions about treatment outcomes in patients with secondary disease.
How do the Short-Term and Long-Term Outcomes of the Interventions of Interest Differ by the Following Subpopulations: Age, Race/Ethnicity, Gender, Primary or Secondary OA, Disease Severity and Duration, Weight (Body Mass Index), and Prior Treatments?
Four RCTs were identified examining any of the specified subgroups. None examined race/ethnicity, disease duration, or prior treatment. In one trial, randomization was stratified by disease severity; all other subgroup results were obtained in post-hoc analyses. There was no evidence for differential effects according to subgroups defined by age, sex, primary/disease, BMI/weight, or disease severity. One positive post-hoc subgroup analysis found greater efficacy among older individuals with more severe disease, but was not confirmed in a subsequent trial.
How do the Short-Term and Long-Term Outcomes of the Interventions of Interest Compare for the Treatment of: Primary OA of the Knee; and Secondary OA of the Knee?
No trials were identified comparing intra-articular hyaluronan to glucosamine and/or chondroitin. A single, small, underpowered, poor quality trial found no difference in outcome measures comparing intra-articular hyaluronan to arthroscopy and debridement over a 1-year followup. There is insufficient evidence to draw conclusions regarding comparative efficacy of the interventions.
We used the results of study-level meta-analyses (MAs) and additional randomized controlled trials (RCTs) that were not included in the MAs to address the Key Questions of this Evidence Report on osteoarthritis (OA) of the knee.
This section of the Evidence Report includes six MAs* and five RCTs not included in the MAs.† In this section, we provide a brief descriptive overview of the MAs and identify the additional RCTs. Our systematic review of the literature did not identify any patient-level MAs on these interventions.
| MA Author, Year | Industry Funding of MA | Key Question(s) Addressed | Included RCT Design | No. of RCTs Included (total pts) | Outcomes Reported | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | DB | SB | PC | AC | C | G | Pain | Func | Struc | AEs | ||
| Bjordal et al., 2006 | NR | X | X | X | X | X | 6 (362) | 7 (401) | X | ||||||
| Towheed et al., 2006 | NR | X | X | X | X | X | NA | 20 (2,596) | X | X | X | ||||
| Poolsup et al., 2005 | NR | X | X | X | NA | 2 (414) | X | X | X | X | |||||
| Richy et al., 2003 | NR | X | X | X | X | 8 (855) | 7 (1,203) | X | X | X | X | ||||
| Leeb et al., 2000 | NR | X | X | X | X | X | 7 (703) | NA | X | X | X | ||||
| McAlindon et al., 2000 | NR | X | X | X | X | X | 9 (799) | 6 (1,118) | X | ||||||
| No. RCTs Pooled (Total in Literature) | 12 | 21 | |||||||||||||
AC: active-controlled; AEs: adverse events; C: chondroitin; DB: double-blind; G: glucosamine; Func: function; NR: not reported; PC: placebo-controlled; pts: patients; SB: single-blind; Struc: structural; RCT: randomized controlled trial;
| Study | No. Pts per Study Arm | Duration (wks) | Outcomes Reported | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| G | C | G/C | Pl | Act | Pain | Func | Struc | AEs | ||
| Herrero-Beaumont et al., 2007 | 106 | 104 | 108 | 24 | X | X | X | |||
| Clegg et al., 2006 | 317 | 318 | 317 | 313 | 318 | 24 | X | X | X | |
| Michel et al., 2005 | 150 | 150 | 104 | X | X | X | X | |||
| Uebelhart et al., 2004 | 54 | 56 | 52 | X | X | X | X | |||
| Das and Hammad, 2000 | 46 | 47 | 24 | X | X | X | X | |||
Act: active; AEs: adverse events; C: chondroitin; G: glucosamine; G/C: glucosamine plus chondroitin; Pl: placebo; Func: function; Struc: structural; wks: weeks
| Evaluation Criteria | Bjordal et al., 2006 | Towheed et al., 2006 | Poolsup et al., 2005 | Richy et al., 2003 | Leeb et al., 2000 | McAlindon et al., 2000 |
|---|---|---|---|---|---|---|
| Were the search methods used to find evidence (primary research) on the primary question(s) stated? | Y - clearly stated | Y - clearly stated | Y - clearly stated | Y - clearly stated | Y - clearly stated | Y - clearly stated |
| Was the search for evidence reasonably comprehensive? | Y - clearly stated, comprehensive, but language restricted to English, German, Scandinavian | Y - clearly stated, comprehensive, no language restrictions | N - did not specify language restrictions, did not seek unpublished data | Y - clearly stated, comprehensive, no language restrictions | P - search strategy not specified, language restrictions unclear, scope unclear | P - electronic search did not include EMBASE but did include Cochrane database |
| Were the criteria used for deciding which studies to include in the overview reported? | Y - clearly stated | Y - clearly stated | Y - clearly stated | Y - clearly stated | Y - clearly stated | Y - clearly stated |
| Was bias in the selection of studies avoided? | Y - comprehensive search, published and unpublished data sought | Y - comprehensive search, published and unpublished data sought | N - Unpublished data not sought or included, language restrictions not specified | Y - comprehensive search, published and unpublished data sought | N - Unpublished data not sought or included, language restrictions not specified | P - electronic search did not include EMBASE |
| Were the criteria used for assessing the validity of the included studies reported? | Y - numerical score provided according to Jadad et al. | Y - quality scores provided according to Gotzsche and Jadad et al. | Y - numerical score provided according to Jadad et al. | Y - numerical score provided according to Jadad et al. | Y - unclear, no method cited | Y - clearly stated |
| Was the validity of all studies referred to in the text assessed using appropriate criteria (either in selecting studies for inclusion or in analyzing the studies that are cited)? | Y - validated methods clearly stated | Y - validated methods clearly stated | Y - validated methods clearly stated | Y - clearly stated in tables | Y - clearly stated in tables | Y - validated methods clearly stated |
| Were the methods used to combine the findings of the relevant studies (used to reach a conclusion) reported? | Y - clearly stated | Y - handling of dichotomous and continuous outcomes clearly stated | Y - clearly stated | Y - handling of dichotomous and continuous outcomes clearly stated | Y - clearly stated | Y - clearly stated |
| Were the findings of the relevant studies combined appropriately relative to the primary question the overview addresses? | Y - clearly stated | Y - clearly stated | Y - only used 2 studies because of very strict inclusion criteria | P - combined data from studies of both compounds based on the absence of efficacy differences, also mixed in some data from hip pts | Y - clearly stated | Y - clearly stated |
| Were the conclusions made by the author(s) supported by the data and/or analysis reported in the overview? | Y - analysis within parameters was adequate, but went further in putting results into a “clinical” context for pain perception | Y - thorough analyses broken down according to outcomes scored and adverse events | Y - but limited number of studies reduces the impact of the MA | P - combined data from studies of both compounds based on the absence of efficacy differences, yet stated they were individually efficacious | Y - authors stated MA only “suggests that CS may be useful in OA”. | P - combined enteral and parenteral administration data, made reference to “safety” even though adverse events weren't compiled or analyzed |
| How would you rate the scientific quality of the overview?* | 7 | 7 | 3 | 5 | 3 | 4 |
Y: Yes; P: Partially or can't tell; N: No
1&2: extensive flaws; 3&4: major flaws; 5&6: minor flaws; 7 minimal flaws
| Study | Bjordal et al., 2006 | Towheed et al., 2006 | Poolsup et al., 2005 | Richy et al., 2003 | Leeb et al., 2000 | McAlindon et al., 2000 |
|---|---|---|---|---|---|---|
| Heterogeneity | ||||||
| Assessed | Yes | Yes | Yes | Yes | Yes | Yes |
| Test used | Cochran Q | Chi-square | Cochran Q | Cochran Q | 95% CIs of Glass scores | p value reported, but test used not stated |
| Result | Outcome measures during first 4 weeks of treatment were not heterogeneous | For GS or GH vs. placebo: reduction in pain and LI scores were heterogeneous | Disease progression: | Outcome measures including JSN (p=.95), LI (p=.68), WOMAC (p=.83), mobility (p=.73) showed no heterogeneity | NR | Heterogeneity (p<.001) among chondroitin trials but attributable to a single study (Rovetta 1991) |
| GS Q = 1.3 | Pain: | Q=0.35 | VAS pain likely heterogeneous as RE model was used to combine data (p value not provided) | |||
| CS Q = 1.8 | I2 = 88.5% | (p>.1) | ||||
| (p>.05 for either) | LI: | Pain: | ||||
| I2 = 0 for both comparisons | I2 = 89.4% | Q=0.003 | ||||
| (due to critically low Q) | (p>.1) | |||||
| WOMAC function: | ||||||
| Q=0.0009 | ||||||
| (p>.1) | ||||||
| I2 = 0 for all comparisons | ||||||
| (due to critically low Q) | ||||||
| Meta-Regression | ||||||
| Conducted | Yes | Yes | NR | NR | NR | NR |
| Factors explored | Drug types within the same class | Pain and function in studies that used Rotta Research Laboratorium preparation of glucosamine versus those that used non-Rotta preparation(s) | NR | NR | NR | NR |
| Patient selection criteria | ||||||
| Missing data in ITT analyses | ||||||
| Sensitivity analysis** | Yes - planned using same subgroups if Q values indicated heterogeneity was present, not necessary for GH/Gs or CS | Yes - Pain, function, radiologic measures in studies with adequate allocation concealment | NR | Yes | NR | Yes for trial size, quality |
| Funnel plot/publication bias | NR | NR | NR | Funnel Plot (asymmetric) | Yes | Funnel plot (asymmetric, p<.01) |
| Egger Test (p=.08) | Non-central t-distribution revealed a relative error of about 30% | |||||
| Included studies and compounds assessed | CS = 6 single- or double-blind placebo-controlled RCTs | 20 double-blind RCTs, GS/GH | 2 double-blind placebo-controlled RCTs of GS | 15 double-blind placebo-controlled RCTs of GS and CS | 7 double-blind placebo-controlled RCTs of CS | CS=6 double-blind placebo-controlled RCTs |
| GS = 7 single- or double-blind placebo-controlled RCTs | GS/GH = 9 double-blind placebo-controlled RCTs | |||||
| Industry sponsored | 5 of 6 CS trials industry funded | 15/20 connected to Rotta to some degree | NR in meta-analysis, but both studies were funded by Rotta | NR in meta-analysis | NR in meta-analysis | 13/15 trials had some connection with a product manufacturer |
CS: chondroitin sulfate; GS: glucosamine sulfate; GH: glucosamine hydrochloride; ITT: intent to treat; NR: not reported; RCT: randomized controlled trial
If study subgroups examined eliminating those likely to influence or bias results
| Primary Study | Study Design | Route of Administration | Type of Control Used | ≥80% Knees | Publication Type | Meta-Analysis* (Year) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| O | P | Pl | Act | Art | Abs | Bjordal (2006) | Towheed (2006) | Poolsup (2005) | Richy (2003) | McAlindon 2000) | |||
| Cibere et al., 2004 | DB | X | X | X | X | X | |||||||
| McAlindon et al., 2004 | DB | X | X | X | X | X | X | ||||||
| Usha and Naidu, 2004 | DB | X | X | X | X | X | X | ||||||
| Hughes and Carr, 2002 | DB | X | X | X | X | X | X | X | |||||
| Pavelka et al., 2002 | DB | X | X | X | X | X | X | X | |||||
| Zenk et al., 2002 | DB | X | X | NR | X | X | |||||||
| Reginster et al., 2001 | DB | X | X | X | X | X | X | X | |||||
| Rindone et al., 2000 | DB | X | X | X | X | X | X | X | |||||
| Houpt et al., 1999 | DB | X | X | X | X | X | X | ||||||
| Houpt et al., 1998 | DB | X | X | X | X | X | |||||||
| Qiu et al., 1998 | DB | X | X | X | X | X | |||||||
| Rovati, 1997 | DB | X | X | X | X | X | X | X | |||||
| Muller-Fassbender et al., 1994 | DB | X | X | X | X | X | |||||||
| Noack et al., 1994 | DB | X | X | X | X | X | X | X | X | ||||
| Reichelt et al., 1994 | DB | X | X | X | X | X | X | ||||||
| (IM) | |||||||||||||
| Lopes Vaz, 1982 | DB | X | X | X | X | X | |||||||
| D'Ambrosio et al., 1981 | DB | X | X | NR | X | X | |||||||
| (IV/IM) | |||||||||||||
| Vajaradul, 1981 | DB | X | X | X | X | X | X | ||||||
| (IA) | |||||||||||||
| Crolle and D'Este, 1980 | DB | X | X | NR | X | X | |||||||
| (IM/IA) | |||||||||||||
| Drovanti et al., 1980 | DB | X | X | NR | X | X | |||||||
| Pujalte et al., 1980 | DB | X | X | X | X | X | X | X | X | ||||
| No. RCTs Pooled (Total 21 in Literature) | 7 | 20 | 2 | 7 | 6 | ||||||||
Abs: abstract; Act: active; Art: article; DB: double-blind; IA: intra-articular; IM: intramuscular; IV: intravenous; NR: not reported; O: oral; P: parenteral; Pl: placebo;
Bold face type and shading indicates study that meets Evidence Report selection criteria (see Methods section)
One MA included primary studies that used a reference control, pooling them with studies that used placebo controls (Towheed, Maxwell, Anastassiades, et al., 2006). Glucosamine was administered orally in 17 RCTs and parenterally in four. Two MAs combined data from studies in which glucosamine was administered parenterally with those in which it was given orally (Towheed, Maxwell, Anastassiades, et al., 2006; McAlindon, LaValley, Gulin, et al., 2000). Seventeen studies reported at least 80 percent of patients had knee OA. Four RCTs did not specify the knee as the primary affected joint (Zenk, Helmer, Kuskowski, et al., 2002; D'Ambrosio et al. 1981; Crolle and D'Este, 1980; Drovanti, Bignamini, and Rovati, 1980).
To assess the MAs as a means to address the Key Questions of this Evidence Report, we applied study selection criteria outlined in the Methods chapter to the primary studies in each MA. Two MAs contained RCTs that do not match the criteria specified in our Evidence Report (Towheed, Maxwell, Anastassiades, et al., 2006; McAlindon, LaValley, Gulin, et al., 2000). Ten of 20 RCTs included by the MA by Towheed and colleagues (2006) are not relevant to the aims of this Report, as will be outlined in the Results section for each MA. However, Towheed, Maxwell, Anastassiades, et al. (2006) includes all 10 trials that we have determined are applicable to our Report, whereas Bjordal, Klovning, Ljunggren, et al. (2006) and Richy, Bruyere, Ethgen, et al. (2003) excluded 3 of the 10.
The MA by Poolsup, Suthisisang, Channark, et al. (2005) examined the effect of glucosamine on structural progression of OA of the knee. Only two RCTs report such data (Pavelka, Gatterova, Olejarova, et al., 2002; Reginster, Deroisy, Rovati, et al., 2001). The earliest MA includes only 3 RCTs that meet our selection criteria, but publication chronology may be the key factor in that situation (McAlindon, LaValley, Gulin, et al., 2000). The primary literature on glucosamine comprising the other three MAs is consistent with our selection criteria (Bjordal, Klovning, Ljunggren, et al., 2006; Poolsup, Suthisisang, Channark, et al., 2005; Richy, Bruyere, Ethgen, et al., 2003).
| Primary Study | Study Design | Route of Administration | Type of Control Used | ≥80% Knees | Publication Type | Meta-Analysis* (Year) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| O | P | Pl | Act | Art | Abs | Bjordal (2006) | Richy (2003) | Leeb (2000) | McAlindon (2000) | |||
| Mazieres et al., 2001 | DB | X | X | X | X | X | X | |||||
| Bourgeois et al., 1998 | DB | X | X | X | X | X | X | X | X | |||
| Bucsi and Poor, 1998 | DB | X | X | X | X | X | X | X | X | |||
| Conrozier, 1998 | DB | X | X | X | X | X | X | |||||
| Pavelka et al., 1998 | DB | X | X | X | X | X | X | |||||
| Uebelhart et al., 1998 | DB | X | X | X | X | X | X | X | X | |||
| Morreale et al., 1996 | DB | X | X | X | X | X | X | |||||
| Conrozier and Vignon, 1992 | DB | X | X | X | X | |||||||
| L'Hirondel, 1992 | DB | X | X | X | X | X | X | X | ||||
| Mazieres et al., 1992 | DB | X | X | X | X | X | X | X | ||||
| Rovetta, 1991 | DB | X | X | X | X | X | ||||||
| (IM) | ||||||||||||
| Kerzberg et al., 1987 | DB/CO | X | X | X | X | X | ||||||
| (IM) | ||||||||||||
| No. RCTs Pooled (Total 12 in Literature) | 6 | 8 | 7 | 9 | ||||||||
Abs: abstract; Act: active; Art: article; DB: double-blind; CO: crossover; IM: intramuscular; O: oral; P: parenteral; Pl: placebo;
Bold face type and shading indicates study that meets Evidence Report selection criteria (see Methods section)
Three MAs included RCTs that used reference controls (Bjordal, Klovning, Ljunggren, et al., 2006; Leeb, Schweitzer, Montag, et al., 2000; McAlindon, LaValley, Gulin, et al., 2000). Chondroitin was administered orally in ten trials and parenterally in two. One MA pooled data from RCTs that used either route (McAlindon, LaValley, Gulin, et al., 2000). Ten studies included only patients with OA of the knee. Two included patients with OA of the knee and of the hip (Conrozier and Vignon, 1992; Mazieres, Loyau, Menkes, et al., 1992). The latter 2 RCTs were pooled with OA of the knee patient data in one MA (Leeb, Schweitzer, Montag, et al., 2000).
Our study selection criteria excluded primary studies from each of the four MAs. This is particularly evident with one MA of nine primary studies, five of which would be allowed by our criteria (McAlindon, LaValley, Gulin, et al., 2000).
Outcomes Measured in Randomized Trials That Meet Protocol Selection Criteria. A number of health outcomes reported in primary RCTs provide relevant information to address Key Questions 1 and 2. To facilitate this presentation, where appropriate we have included the studies from the MAs with the additional studies in the summary tables.
| Study | VAS Pain | WOMAC | Global Assessment | LI | Walking Time | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Motion | Rest | Overall | Pain | Function | Stiffness | Total | Phys | Pat | |||
| Studies Included in Meta-Analyses | |||||||||||
| McAlindon et al, 2004 | X | X | X | X | |||||||
| Usha and Naidu, 2004 | X | X | |||||||||
| Hughes and Carr, 2002 | X | X | X | X | X | X | |||||
| Pavelka et al., 2002 | X | X | X | X | |||||||
| Reginster et al., 2001 | X | X | X | X | |||||||
| Rindone et al., 2000 | X | X | |||||||||
| Houpt et al., 1999 | X | X | X | X | |||||||
| Rovati, 1997 | X | ||||||||||
| Noack et al., 1994 | X | ||||||||||
| Pujalte et al. 1980 | X | X | |||||||||
| Additional Studies not Included in Meta-Analyses | |||||||||||
| Herrero-Beaumont et al., 2007 | X | X | X | X | X | X | |||||
| Clegg et al., 2006 | X | X | X | X | X | X | |||||
LI: Lequesne Index; VAS: visual analog scale; WOMAC: Western Ontario and McMaster index;
| Study | VAS Pain | WOMAC | Global Assessment | LI | Walking Time | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Motion | Rest | Overall | Pain | Function | Stiffness | Total | Phys | Pt | |||
| Studies Included in Meta-Analyses | |||||||||||
| Mazieres et al., 2001 | X | X | X | X | X | ||||||
| Bourgeois et al., 1998 | X | X | |||||||||
| Bucsi and Poor, 1998 | X | X | X | ||||||||
| Conrozier, 1998 | X | ||||||||||
| Uebelhart et al., 1998 | X | ||||||||||
| L'Hirondel, 1992 | X | X | |||||||||
| Additional Studies not Included in Meta-Analyses | |||||||||||
| Clegg et al. (2006) | X | X | X | X | X | X | |||||
| Michel et al. (2005) | X | X | X | X | |||||||
| Uebelhart et al. (2004) | X | X | X | ||||||||
LI: Lequesne Index; pt: patient; VAS: visual analog scale; WOMAC: Western Ontario and McMaster index;
| Study | VAS Pain | WOMAC | Global Assessment | LI | Walking Time | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Motion | Rest | Overall | Pain | Function | Stiffness | Total | Phys | Pt | |||
| Clegg et al., 2006 | X | X | X | X | X | X | |||||
| Das and Hammad, 2000 | X | X | X | ||||||||
LI: Lequesne Index; pt: patient; VAS: visual analog scale; WOMAC: Western Ontario and McMaster index;
| Study | Dose (Type) | N Tx/Pl | Mn Age Tx/Pl (yrs) | Female Pts (%) Tx/Pl | BMI (kg/m2) Tx/Pl | OA Diag† | OA Stage (%Tx/%Pl) | Mn Dis Duration Tx/Pl (yrs) | Mn VAS Movement (mm) Tx/Pl | Mn VAS Rest (mm) Tx/Pl | Mn WOMAC Pain Tx/P | Mn WOMAC Function Tx/Pl | Mn WOMAC Stiffness Tx/Pl | Mn WOMAC Total Tx/Pl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Studies Included in Meta-Analyses | ||||||||||||||
| McAlindon et al., 2004 | 1,500 mg/day (GS) | 101/104 | Rng <54–95/<54–84 | 57/71 | 31.0 ± 7.6/34.1 ± 9.0 | ? | NR | NR | Likert 8.8/9.1 | Likert 4.2/4.1 | Likert 30.2/31.6 | Likert 43.2/44.8 | ||
| p=.04 | p=.01 | |||||||||||||
| Usha and Naidu, 2004 | 1,500 mg/day (inferred GS) | 30/28 | 52/50 | 60/57 | 26.6/25.4 (calculated) | ? | K-L | 3.2/2.9 | 58/NR | |||||
| 1–3 most | ||||||||||||||
| Hughes and Carr, 2002 | 1,500 mg/day (GS) | 40/40 | All: 62 | All: 68 | NR | ? | K-L 1 (all, 9) | All: 7.6 | All: 60.7 | All: 35.0 | Likert All: 9.2 | Likert All: 32.9 | Likert All: 4.4 | |
| 2 (all 31) | ||||||||||||||
| 3 (all 37) | ||||||||||||||
| 4 (all 23) | ||||||||||||||
| Pavelka et al., 2002 | 1,500 mg/day (GS) | 101/101 | 61/64 | 79/76 | 25.7 ± 2.1/25.7 ± 1.8 | 1° | K-L 2 (54/53) | 10.1/11.0 | Likert 6.6/6.3 | Likert 21.8/22.0 | Likert 2.2/2.2 | Likert 30.7/30.5 | ||
| 3 (46/47) | ||||||||||||||
| Reginster et al., 2001 | 1,500 mg/day (GS) | 106/106 | 66/66 | 75/78 | 27.3 ± 2.6/27.4 ± 2.7 | 1° | K-L2 (71/70) | 8.0/7.6 | 194.1/172.2 | 740.1/670.8 | 96.0/96.7 | 1030/940 | ||
| 3 (29/30) | ||||||||||||||
| Rindone et al., 2000 | 1,500 mg/day (unclear) | 49/49 | 63/64 | 4/6 | NR | ? | K-L 1 (40/30) | 12/14 | (0–10) 6.4/6.4 | (0–10) 3.9/3.6 | ||||
| K-L 2(18/19) | ||||||||||||||
| K-L 3 (35/35) | ||||||||||||||
| K-L 4 (7/16) | ||||||||||||||
| Houpt et al., 1999 | 1,500 mg/day (GH) | 58/60 | 64/65 | 64/60 | NR | 1° | NR | 8.3/8.3 | Likert 8.8/8.4 | Likert 33.4/30.1 | Likert 4.1/4.0 | Likert 46.4/42.4 | ||
| Rovati (1997) | 1,500 mg/day (GS) | NR | NR | NR | NR | ? | NR | NR | NR (used LI)‡ | NR | NR | NR | NR | NR |
| Noack et al., 1994 | 1,500 mg/day (GS) | 126/126 | 55/55 | 59/62 | 26.6/26.2 (calculated) | 1° | NR | All: rng <6 mo to >10 yr | ||||||
| Pujalte et al., 1980 | 1,500 mg/day (GS) | 10/10 | 59/65 | 80/90 | NR | ? | NR | NR | ||||||
| Additional Studies not Included in Meta-Analyses | ||||||||||||||
| Herrero-Beaumont et al., 2007 [GUIDE] | 1,500 mg/day (GS) | 106/108/104 | GS: 63.4 ± 6.9 | 91/93/89 | GS: 27.7 ± 2.3 | 1° | K-L 2: 50/56/50 | GS: 7.4 ± 6.0 | GS: 7.8 ± 3.0 | GS: 27.8 ± 11.4 | GS: 38.3 ± 15.2 | |||
| Acet: 63.8 ± 6.9 | Acet: 27.9 ± 2.3 | K-L 3: 41/31/39 | Acet: 6.5 ± 5.3 | Acet: 8.0 ± 2.9 | Acet: 29.4 ± 11.0 | Acet: 40.4 ± 14.8 | ||||||||
| Pl: 64.5 ± 7.2 | Pl: 27.6 ± 2.4 | K-L 2/3: 9/12/11 | Pl: 7.2 ± 5.8 | Pl: 7.9 ± 3.0 | Pl: 27.2 ± 10.9 | Pl: 37.9 ± 14.3 | ||||||||
All values are mean ± SD unless otherwise noted;
ACR criteria;
Outcomes are generally those that are denoted in the paper as being the primary study outcomes;
Acet: acetaminophen; ACR: American College of Rheumatology; BMI: body-mass index; Dis: disease; GS: glucosamine sulfate; GH: glucosamine hydrochloride; K-L: Kellgren-Lawrence criteria; LI: Lequesne Index; mn: mean; NR: not reported; Pl: placebo; rng: range; Tx: treatment; VAS: visual analog scale; WOMAC: Western Ontario and McMaster Universities Osteoarthritis Index;
The mean age of patients ranged between 50 and 66 years, with females comprising 4–90 percent of the study samples. In nine of 11 trials, females made up 60 percent or more of the enrolled patients. Five RCTs of glucosamine reported on patients with primary OA according to ACR criteria. None of the glucosamine studies reported patients specifically with secondary OA. Six reports did not specify whether patients had primary or secondary OA. The mean duration of OA of the knee ranged from 6 months or less to more than 10 years. Most patients in the RCTs had Kellgren-Lawrence grade 2–3 OA of the knee. One study included subjects who had Kellgren-Lawrence grade 4 disease (Hughes and Carr, 2002). No significant differences were reported between the composition of the treatment and placebo groups or their baseline characteristics, with the exception of a slight variation in sex distribution and BMI reported in one study (McAlindon, Formica, LaValley, et al., 2004).
| Study | Dose (Type) | N Tx/Pl | Mn Age Tx/Pl (yrs) | Female Pts (%) Tx/Pl | BMI (kg/m2) Tx/Pl | OA Diag† | OA Stage (%Tx/%Pl) | Mn Dis Duration Tx/Pl (yrs) | Mn VAS Movement (mm) Tx/Pl | Mn VAS Rest (mm) Tx/Pl | Mn WOMAC Pain Tx/P | Mn WOMAC Function Tx/Pl | Mn WOMAC Stiffness Tx/Pl | Mn WOMAC Total Tx/Pl | Mn LI Tx/Pl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Studies Included in Meta-Analyses | |||||||||||||||
| Mazieres et al., 2001 | 1,000 mg/day (CS) | 63/67 | 67/67 | 71/78 | 29.2 ± 5.1/28.9 ± 4.8 | 1° | K-L | NR | 54.4/53.0 | 29.9/27.7 | 8.8/8.9 | ||||
| 2 (59/54) | |||||||||||||||
| 3 (41/46) | |||||||||||||||
| Bourgeois et al., 1998 | Daily 1,200 mg/day (CS 4&6) | Daily/3X daily//Pl 40/43/44 | 63/63/64 | 65/79/84 | NR | 1° | ACR | By L,R 6,5/4,5/6,6 | 58/54/56 | 11/10/10 | |||||
| 3X daily | All: 1–3 (100) | ||||||||||||||
| 400 mg/day (CS 4&6) | |||||||||||||||
| Bucsi and Poor, 1998 | 800 mg/day (CS) | 39/46 | 61/59 | 56/63 | 29.2/29.1 (estimated) | 1°/2° | K-L | NR | 56/56 | R,L | |||||
| All: 1–3 (100) | 12.8,12.0/11.8,11.5 | ||||||||||||||
| Conrozier, 1998 | 800 mg/day (CS 4&6) | All: 104 | NR | NR | ? | NR | NR | ~9.0/~9.1 | |||||||
| Uebelhart et al., 1998 | 800 mg/day (CS 4&6) | 23/23 | 60/57 | 48/56 | 25.5/27.2 (estimated) | 1°/2° | K-L | NR | 56/64 | ||||||
| 1 (44/48) | |||||||||||||||
| 2 (48/44) | |||||||||||||||
| 3 (9/9) | |||||||||||||||
| L'Hirondel, 1992 | 1200 mg/day (CS) | 63/62 | All: 63 | 32.6 | NR | ? | NR | NR | (0–5) 4.03/3.90 | 10.73/11.02 | |||||
| Additional Studies not Included in Meta-Analyses | |||||||||||||||
| Michel et al., 2005 | 800 mg/day (CS 4&6) | 150/150 | 62/63 | 51/52 | 27.7 ± 5.2/28.1 ± 5.5 | 1° | K-L | NR | (0–10) 2.5/2.7 | (0–10) 2.1/2.5 | (0–10) 3.0/3.5 | (0–10) 2.3/2.6 | |||
| All: 1–3 (100) | |||||||||||||||
| Uebelhart et al., 2004 | 800 mg/day (CS 4&6) | 54/56 | 63/64 | 80/82 | NR | 1° | K-L | 4.2/4.4 | 58.8/61.1 | 9.0/9.1 | |||||
| 1 (7/6) | |||||||||||||||
| 2 (32/33) | |||||||||||||||
| 3 (15/17) | |||||||||||||||
All values are mean ± SD unless otherwise noted;
ACR criteria;
ACR: American College of Rheumatology; BMI: body-mass index; CS: chondroitin sulfate; Dis: disease; K-L: Kellgren-Lawrence criteria; LI: Lequesne Index; mn: mean; NR: not reported; Pl: placebo; rng: range; Tx: treatment; VAS: visual analog scale; WOMAC: Western Ontario and McMaster Universities Osteoarthritis Index;
| Study | Dose (Type) | N Tx/Pl | Mn Age Tx/Pl (yrs) | Female Pts (%) Tx/Pl | BMI (kg/m2) Tx/Pl | OA Diag† | OA Stage (%Tx/%Pl) | Mn Dis Duration Tx/Pl (yrs) | Mn VAS Movement (mm) Tx/Pl | Mn VAS Rest (mm) Tx/Pl | Mn WOMAC Pain Tx/Pl | Mn WOMAC Function Tx/Pl | Mn WOMAC Stiffness Tx/Pl | Mn WOMAC Total Tx/Pl | Mn LI Tx/Pl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Clegg et al., 2006 [GAIT] | 1,200 mg/day (CS) | 318/313 | 58/58 | 64/64 | 32.0 ± 7.6/31.9 ± 7.3 | 1° | K-L | 9.7/9.5 | (0–500) | (0–1700) | (0–200) | (0–300) | |||
| 2 (59/57) | 235.3/237.1 | 778.9/765.8 | 106.6/106.6 | 146.0/145.8 | |||||||||||
| Das and Hammad, 2000 | 1,600 mg/day (CS) | 46/47 | 64/66 | 72/78 | 30.5 ± 1.0/30.2 ± 0.9 (SEM) | 1°/2° | K-L 2/3 | 5.6/7.4 | (0–2,400) | K-L 2/3: | |||||
| (72/83) | K-L 2/3: | 10.2/10.4 | |||||||||||||
| K-L 4 | 908/944 | K-L 4: | |||||||||||||
| (28/17) | K-L 4: | 11.1/10.7 | |||||||||||||
| 1,187/1,089 | |||||||||||||||
All values are mean ± SD unless otherwise noted;
ACR criteria;
ACR: American College of Rheumatology; BMI: body-mass index; CS: chondroitin sulfate; Dis: disease; K-L: Kellgren-Lawrence criteria; LI: Lequesne Index; mn: mean; NR: not reported; Pl: placebo; rng: range; SEM: standard error of the mean; Tx: treatment; VAS: visual analog scale; WOMAC: Western Ontario and McMaster Universities Osteoarthritis Index;
Quality of Randomized Trials That Meet Protocol Selection Criteria. The study quality of primary RCTs that met our protocol selection criteria was evaluated using a grading tool described in the Methods chapter of this Evidence Report.
| Study | Initial Assembly of Comparable Groups | Low Loss to Followup, Maintenance of Comparable Groups | Measurements Reliable, Valid, Equal | Interventions Comparable/Clearly Defined | Appropriate Analysis of Results | Overall Rating |
|---|---|---|---|---|---|---|
| Studies Included in Meta-Analyses | ||||||
| McAlindon et al., 2004 | N* | Y | Y | Y | Y | Fair |
| Usha and Naidu,, 2004 | N† | N | Y | Y | Y | Poor |
| Hughes and Carr, 2002 | Y | Y | Y | Y | Y | Good |
| Pavelka et al., 2002 | Y | N | Y | Y | Y | Fair |
| Reginster et al., 2001 | Y | N | Y | Y | Y | Fair |
| Rindone et al., 2000 | Y | Y | Y | Y | N‡ | Poor |
| Houpt et al., 1999 | Y§ | Y | Y | Y | Y | Good |
| Rovati et al., 1997 | NR** | NR | NR | NR | NR | ? |
| Noack et al., 1994 | ? | Y | N | Y | N†† | Poor |
| Pujalte et al., 1980 | N‡ | N | N | Y | N | Poor |
| Additional Studies not Included in Meta-Analyses | ||||||
| Herrero-Beaumont et al., 2007 | Y | N | Y | Y | Y | Fair |
| Clegg et al., 2006 | Y | Y | Y | Y | Y | Good |
| Das and Hammad, 2000§§ | Y | Y | Y | Y | Y | Good |
Did not report allocation concealment specifically, but Internet-based protocol should have sufficed; statistically significant (p<.05) differences in sex (71% female in placebo group versus 57% in glucosamine group); NSAID use (87% versus 74% in placebo versus glucosamine group); BMI (34.1 versus 31.0 in placebo versus glucosamine group)
Group characteristics not reported extensively, in particular OA grade; no mention of allocation concealment, although ITT analysis was specified
No ITT analysis or description of allocation concealment; specifically analyzed data on completers only
Patients recruited to study via newspaper advertisement, self-reporting at least “moderate” knee pain, so may not be comparable to typical OA population
Abstract that does not present sufficient data to determine a quality rating
Described as double-blind design, but did not mention allocation concealment, used “responders” rate derived from drop in Lequesne index scores as primary beneficial outcome
Combination glucosamine plus chondroitin study
| Study | Initial Assembly of Comparable Groups | Low Loss to Followup, Maintenance of Comparable Groups | Measurements Reliable, Valid, Equal | Interventions Comparable/Clearly Defined | Appropriate Analysis of Results | Overall Rating |
|---|---|---|---|---|---|---|
| Studies Included in Meta-Analyses | ||||||
| Mazieres et al., 2001 | Y | Y | Y | Y | Y | Good |
| Bourgeois et al., 1998 | ?* | Y | Y | Y | Y | Fair |
| Bucsi and Poor, 1998 | ?† | Y | Y | Y | N† | Poor |
| Conrozier, 1998 | ?c | ? | Y | Y | ?c | Poor |
| Uebelhart et al., 1998 | ?† | Y | Y | Y | N† | Poor |
| L'Hirondel, 1992 | N‡ | ?‡ | Y | Y | N‡ | Poor |
| Additional Studies | ||||||
| Michel et al., 2005 | Y | N§ | Y | Y | Y | Fair |
| Uebelhart et al., 2004 | Y | Y | Y | Y | Y | Good |
Did not report allocation concealment, reported ITT analysis, but presented data on loss to percent due only to adverse events (8 total across all 3 groups) with no mention of effect on composition of treatment groups
Did not report allocation concealment or specify ITT analysis
demographic details shown, statistical measures of dispersion not provided, allocation concealment not specified, ITT analysis unclear
Although 27% of pts dropped out, the completers did not differ statistically from the ITT in any parameter
| Compound | No.RCTs | No. Treated Subjects | Mean Study Quality* | I2 (%) | Pooling Metric (model) | Pooled Result† (mm) | 95% CI | p value |
|---|---|---|---|---|---|---|---|---|
| GH/GS | 7 | 401 | 3.6 | 0 | WMD | -4.7 | -0.3, -9.1 | NR |
| (FE) | ||||||||
| CS | 6 | 362 | 3.5 | 0 | WMD | -3.7 | -0.3, -7.0 | NR |
| (FE) | ||||||||
Study quality rated according to 5-point Jadad scale
100 mm VAS, negative pooled result indicates improvement
CI: confidence interval; CS: chondroitin sulfate; FE: fixed effects; GH: glucosamine hydrochloride; GS: glucosamine sulfate; NR: not reported; WMD: weighted mean difference;
This MA included two studies that do not fit selection criteria for this Report. In one trial, 38 percent of patients had hip OA (Mazieres, Loyau, Menkes, et al., 1992); in the second, an active NSAID control (diclofenac) was used (Morreale, Manopulo, Galati, et al., 1996). The Mazieres trial yielded a negative WMD, whereas the Morreale trial produced a positive WMD. Thus, we performed a sensitivity analysis, which confirmed that exclusion of both trials would not significantly affect the overall result or direction of this MA.* Bjordal, Klovning, Ljunggren, et al. (2006) excluded five studies that meet our study selection criteria, but the effect is unknown.†
Comment. Bjordal and colleagues (2006) reported the results of separate meta-analyses of glucosamine or chondroitin on pain due to knee OA. Overall, in terms of the treatment parameters, disease, patient characteristics, and outcomes, their focus was compatible with the aims of this Evidence Report.
The Oxman and Guyatt quality rating for this MA (7) suggests it was not biased by design or analytic methods. However, Bjordal did not perform subgroup or sensitivity analyses of individual study quality parameters, such as the adequacy of allocation concealment or use of ITT analysis. Subgroup and sensitivity analyses are necessary in a MA to formally explore the influence of bias secondary to poor study quality, even in the documented absence of significant heterogeneity.
In contrast to the other MAs in which results were unitless SMDs, or effect sizes, Bjordal and colleagues (2006) used a WMD based on a 100-mm VAS for pain. Because a WMD uses the same scale as the original outcome data, the results have direct clinical meaning. The authors further interpreted their MAs in the context of a clinically meaningful benefit, defined as a minimal perceptible improvement threshold of 10 mm and a minimal clinically important improvement threshold of 20 mm. Thus, even though the pooled results were statistically significant, the WMDs and 95 percent CIs were below either clinically meaningful threshold. It may be concluded that treatment with glucosamine or chondroitin does not reach a level of clinical importance in relieving pain associated with mild-to-moderate knee OA over the 4- to 12-week treatment period studied.
Towheed, Maxwell, Anastassiades, et al. (2006). This is the largest MA available on glucosamine as sole therapy for OA of the knee. A total of 20 double-blinded, placebo- or active-controlled RCTs were included that reported on glucosamine sulfate or glucosamine hydrochloride administered orally or parenterally to patients with primary or secondary OA at any site except temporomandibular joint (TMJ). We rated it a 7 on the Oxman and Guyatt scale, the highest quality level.
| Outcome Measure | No. RCTs | No. Subjects | Mean Study Quality* | I2 (%) | Pooling Metric (model) | Pooled Result† | 95%CI | p value |
|---|---|---|---|---|---|---|---|---|
| Pain‡ | 15 | 1,481 | 3.9 | 88.5 | SMD | -0.61 | -0.95, -0.28 | 0.0003 |
| (RE) | ||||||||
| Lequesne index | 4 | 741 | 4.8 | 89.4 | SMD | -0.51 | -0.96, -0.05 | .03 |
| (RE) | ||||||||
| WOMAC pain | 7 | 955 | 4.4 | 0.0 | SMD | -0.04 | -0.17, 0.09 | .5 |
| (FE) | ||||||||
| WOMAC stiffness | 5 | 538 | 4.4 | 14.3 | SMD | -0.07 | -0.21, 0.08 | .4 |
| (FE) | ||||||||
| WOMAC function | 6 | 750 | 4.3 | 0.0 | SMD | -0.07 | -0.21, 0.08 | .4 |
| (FE) | ||||||||
| WOMAC total | 5 | 672 | 4.4 | 0.0 | SMD | -0.15 | -0.30, 0.00 | .06 |
| (FE) | ||||||||
| Adverse events (AEs) | 14 | 1,685 | 3.9 | 0.0 | RR | 0.97 | 0.88, 1.08 | .6 |
| (FE) | ||||||||
| Withdrawals due to AEs | 17 | 1,908 | 4.0 | 0.0 | RR | 0.82 | 0.56, 1.21 | .3 |
| (FE) | ||||||||
Study quality rated according to 5-point Jadad scale
negative pooled result indicates improvement
Composite including WOMAC pain (n=6 trials), scalar pain otherwise not defined (n=6), VAS pain (n=3)
CI: confidence interval; FE: fixed effects; RE: random effects; RR: relative risk; SMD: standardized mean difference;
| Variable | No. RCTs | No. Subjects | Mean StudyQuality* | I2 (%) | Pooling Metric | Pooled Result† | 95% CI | p value |
|---|---|---|---|---|---|---|---|---|
| Rotta product | 7 | 730 | 3.8 | 93.3 | SMD | -1.31 | -1.99, -0.64 | .0001 |
| (RE) | ||||||||
| Non-Rotta product | 8 | 751 | 4.0 | 43.6 | SMD | -0.15 | -0.35, 0.05 | .1 |
| (RE) | ||||||||
| Adequate allocation concealment | 8 | 1,111 | 4.5 | 83.4 | SMD | -0.19 | -0.50, 0.11 | .2 |
| (RE) | ||||||||
Study quality rated according to 5-point Jadad scale
negative pooled result indicates improvement
CI: confidence interval; FE: fixed effects; SMD: standardized mean difference; RE: random effects;
None of the analyses that used other outcome measures (WOMAC subscales or Lequesne Index) showed statistically significant results in sensitivity analyses.
Comment. The analysis by Towheed, Maxwell, Anastassiades, et al. (2006) consists of 38 separate meta-analyses based on different groupings of 20 RCTs. In the key analysis of pain, the pooled SMD from 15 RCTs was equated with a difference in the change from baseline of 28 percent, suggesting a moderate effect. However, the authors did not test for publication bias, which could skew results. Broader study inclusion and substantial interstudy heterogeneity associated with the SMDs for pain (I2 = 88.5 percent) and Lequesne Index (I2 = 89.4 percent) reflect differences in disease site, route of administration, study duration, and the use of reference and placebo controls.
In a subgroup analysis of the potential effect of Rotta glucosamine sulfate, or indirectly Rotta sponsorship, Towheed and colleagues pooled studies that involved parenteral routes of administration, disease sites other than the knee, and had wide variation in size and duration. Substantial heterogeneity (I2 = 93.3 percent) and lower mean study quality score causes uncertainty in the results of this analysis. The authors explored a few potential sources of heterogeneity, but did not specifically assess the impact of ITT analysis and whether trials were industry-funded. A second sensitivity analysis showed a nonsignificant effect of glucosamine on pain in studies with adequate allocation concealment, suggesting bias secondary to study quality. However, interpretation of these results also is influenced by substantial interstudy heterogeneity (I2 = 83.4 percent).
The authors conclude that there is a statistically significant effect in favor of glucosamine versus placebo in patients with OA. We believe this conclusion is compromised by interstudy heterogeneity and variability with respect to disease site, route of administration, study duration, and the use of active controls and placebo controls. The pooled results were reported as SMDs, which can be difficult to interpret. Finally, concern exists over the thoroughness of exploration of heterogeneity in this meta-analysis, particularly the influence of ITT analysis and industry-funding. While this meta-analysis had some strong methodologic characteristics, concerns noted here call its conclusions into question.
| Outcome Measure | No. RCTs | No. Subjects | Mean Study Quality* | I2 (%) | Pooling Metric (model) | Pooled Result† | 95% CI | p value c |
|---|---|---|---|---|---|---|---|---|
| WOMAC pain | 2 | 414 | 4.5 | 0 | SMD | -0.41 | -0.21, -0.60 | <.0001 |
| (RE) | ||||||||
| WOMAC function | 2 | 414 | 4.5 | 0 | SMD | -0.46 | -0.27, -0.66 | <.0001 |
| (RE) | ||||||||
| Adverse events (AEs) | 2 | 414 | 4.5 | 0 | RR | -1.02 | -0.93, -1.11 | NSD |
| (RE) | ||||||||
Study quality rated according to 5-point Jadad scale
negative pooled result indicates improvement
CI: confidence interval; NSD: no significant difference; RR: relative risk; RE: random effects; SMD: standardized mean difference;
Comment. Poolsup, Suthisisang, Channark, et al., (2005) focused on long-term structural progression of knee OA, rather than symptomatic outcomes that are the focus of this Evidence Report. They reported statistically significant pooled SMDs for two secondary outcomes, WOMAC pain and function, based on data from two RCTs (Pavelka, Gatterova, Olejarova, et al., 2002; Reginster, Deroisy, Rovati, et al., 2001). Fourteen studies were excluded because they did not report structural outcome data.* While this MA was rated low in quality, the 2 trials included were fair quality, with no interstudy heterogeneity reported. Both were sponsored by Rotta.
The conclusion that glucosamine sulfate possesses moderate efficacy in improving symptoms of OA of the knee is limited by the small number of trials and subjects included. Given the structural focus of this MA and narrow inclusion criteria, we conclude that it does not provide relevant information to address the Key Questions of this Evidence Report.
| Outcome Measure | No. RCTs | No. Subjects | Mean Study Quality* | I2 (%) | Pooling Metric (model) | Pooled Result† | 95% CI | p value |
|---|---|---|---|---|---|---|---|---|
| VAS pain | 12 | 1267 | 3.8 | NR | SMD | -0.45 | -0.33, -0.57 | <.001 |
| (RE) | ||||||||
| WOMAC pain | 2 | 414 | 4.5 | NR | SMD | -0.30 | -0.11, -0.49 | <.001 |
| (FE) | ||||||||
| Lequesne index | 10 | 1582 | 3.8 | NR | SMD | -0.43 | -0.32, -0.54 | <.001 |
| (FE) | ||||||||
| Mobility (not defined) | 3 | 150 | 4.0 | NR | SMD | -0.59 | -0.25, -0.92 | <.001 |
| (FE) | ||||||||
| Responder | 9 | 1159 | 3.9 | NR | RR | -1.59 | -1.39, -1.83 | <.001 |
| (FE) | ||||||||
| Adverse events | 11 | 1770 | 4.1 | NR | RR | -0.80 | -0.59, -1.08 | .15 |
| (RE) | ||||||||
Study quality rated according to 5-point Jadad scale
negative pooled result indicates improvement
CI: confidence interval; FE: fixed effects; NR: not reported; RE: random effects; RR: relative risk; SMD: standardized mean difference;
The investigators used the Jadad method to determine mean scores of the pooled RCTs that ranged from 3.8 to 4.5. In the presence of interstudy heterogeneity (I2 not reported), a random effects model was used to pool data. Tests for publication bias with funnel plots and Egger's linear regression test revealed a light asymmetry to the right side, suggesting that more studies of small sample size were associated with high effect sizes than with small effects.
Comment. Richy, Bruyere, Ethgen, et al. (2003) pooled glucosamine and chondroitin studies. They assert that the robustness of their findings, the conservative approach used to pool data, and the use of unpublished data constitute definitive evidence that glucosamine and chondroitin are beneficial. However, the pooled results from this MA are not useful for our purposes as they do not individually report the efficacy of these agents as sole therapy.
| Outcome Measure | No. RCTs | No. Subjects | Mean Study Quality | I2 (%) | Pooling Metric (model) | Pooled Result* | 95% CI† | p value |
|---|---|---|---|---|---|---|---|---|
| VAS pain | 7 | 699 | NR | NR | SMD | -0.90 | -0.80, -1.0 | <.05 |
| (NR) | ||||||||
| Lequesne index | 6 | 653 | NR | NR | SMD | -0.74 | -0.62, -0.80 | <.01 |
| (NR) | ||||||||
negative pooled result indicates improvement
estimated from figures in report
CI: confidence interval; NR: not reported; SMD: standardized mean difference;
Based on qualitative review of the RCTs, Leeb and co-workers (2000) asserted that there was little interstudy heterogeneity. Furthermore, the authors did not use a validated method such as the Jadad score to formally assess study quality. One primary RCT reported on patients with OA of the hip (Conrozier and Vignon, 1992), one included patients with OA of the hip and knee (Mazieres, Loyau, Menkes, et al., 1992), and one study used a reference intervention (diclofenac) in the control group (Morreale, Manopulo, Galati, et al., 1996). All three of these RCTs would be excluded by the selection criteria we defined to address the Key Questions of this Report.
Comment. Leeb and colleagues (2000) conclude that their results provide evidence for significant efficacy of chondroitin sulfate on pain and function in treatment of OA compared to placebo in patients followed for 4 months or more. However, these results have little utility for our purposes. Most notably, they did not assess the effect of heterogeneity, study quality, industry-funding or publication bias on the pooled results. The statistical techniques used to pool and analyze extracted data were poorly described. Finally, the selection criteria we defined to address the Key Questions in this Report would exclude three of 7 trials included in their MA. Given the significant methodological shortcomings, we believe this MA does not support a conclusion that chondroitin sulfate is more effective than placebo in therapy of knee OA.
| Compound | No. RCTs | No. Subjects | Mean Study Quality*(range) | Heterogeneity(p value) | Pooling Metric† | Pooled Result‡ | 95% CI | p value |
|---|---|---|---|---|---|---|---|---|
| GH/GS | 6 | 911 | 38 (12–52) | NSD | SMD | -0.44 | -0.24, -0.64 | NR |
| CS | 9 | 799 | 34 (14–55) | <.001 | SMD | -0.96 | -0.63, -1.3 | NR |
Study quality score based on reported compliance with 14 aspects of clinical trial conduct, ranging from 0 to 68 for negative and from 0 to 65 for positive studies, expressed as a percentage of the maximum possible score for each trial
All results were pooled using a random-effects model;
negative pooled result indicates improvement
CI: confidence interval; NR: not reported; NSD: no significant difference; SMD: standardized mean difference;
Tests for publication bias (funnel plots) showed statistical evidence of significant bias that reflected an absence of trials with both small numbers of participants and small or null treatment effects. Assessment of primary study quality showed allocation concealment was frequently inadequate and intention-to-treat analysis was rarely performed.
| Variable | No RCTs | No Subjects | Study Quality* | Hetero-geneity (p value) | Pooling Metric† | Pooled Result‡ | 95% CI | p value |
|---|---|---|---|---|---|---|---|---|
| Low-quality GS/GH trials | 3 | 403 | < 40 | NR | SMD | -0.7 | -0.4, -1.0 | NR |
| High-quality GS/GH trials | 3 | 508 | ≥ 40 | NR | SMD | -0.3 | 0.1, -0.5 | NR |
| Low-quality CS trials | 4 | 324 | < 35 | NR | SMD | -1.7 | -0.7, -2.7 | NR |
| High-quality CS trials | 5 | 475 | ≥ 35 | NR | SMD | -0.8 | -0.6, -1.0 | NR |
| Small GS/GH trials | 3 | 175 | 39 | NR | SMD | -0.5 | -0.1, -0.9 | NR |
| Large GS/GH trials | 3 | 736 | 36 | NR | SMD | -0.4 | -0.1, -0.7 | NR |
| Small CS trials | 4 | 183 | 34 | NR | SMD | -1.7 | -0.5, -2.8 | NR |
| Large CS trials | 5 | 616 | 34 | NR | SMD | -0.8 | -0.6, -1.0 | NR |
Study quality score based on reported compliance with 14 aspects of clinical trial conduct, ranging from 0 to 68 for negative and from 0 to 65 for positive studies, expressed as a percentage of the maximum possible score for each trial
All results were pooled using a random effects model;
negative pooled result indicates improvement
CI: confidence interval; NR: not reported; SMD: standardized mean difference;
One of six glucosamine RCTs involved parenteral administration (Vajaradul, 1981). Two chondroitin trials used intramuscular injection (Rovetta, 1991; Kerzberg, Roldan, Castelli, et al., 1987) and one combined patients with OA of the knee or hip (Mazieres, Loyau, Menkes, et al., 1992). None of the primary studies reported receiving independent funding from a governmental or not-for-profit source. Thirteen of 15 RCTs reported some connection with the drug manufacturer. A number of studies relevant to our Report have been subsequently published for glucosamine sulfate/glucosamine hydrochloride (Houpt, McMillan, Wein, et al., 1999; Rindone, Hiller, Collacott, et al., 2000; Reginster, Deroisy, Rovati, et al., 2001; Pavelka, Gatterova, Olejarova, et al., 2002; Hughes and Carr, 2002; Usha and Naidu, 2004; and McAlindon, Formica, LaValley, et al., 2004). For chondroitin sulfate, one study was published later (Mazieres, Combe, Phan Van, et al., 2001).
Comment. The focus of the MA by McAlindon, LaValley, Gulin, et al. (2000) was generally comparable to that of our Evidence Report. However, it is limited for our purposes in several respects. First, the Oxman and Guyatt score (4) reflects major flaws in its design and conduct, primarily ascribed to study selection bias. McAlindon and colleagues included several trials that do not meet our selection criteria with respect to the route of drug administration and disease site. Second, sensitivity analyses suggested that heterogeneity due to differences in the quality and size of the primary studies differentially and substantially influenced the size of pooled SMDs depending on the intervention. Third, the presence of statistical evidence of bias in a funnel plot suggests caution is warranted in interpreting the results of this MA. The genesis of bias in this MA is unclear but could be a function of selective publication of positive trials, post hoc selection of study outcome measures, and premature trial termination once a positive outcome is achieved. Finally, the use of SMDs complicates interpretation and direct clinical application of the results.
The MA authors conclude that glucosamine and chondroitin may have efficacy in treating OA symptoms and are safe, although they conceded the necessity for additional high-quality, independent studies to determine the actual clinical effectiveness of these preparations as therapy for symptomatic OA. Given the uncertainties outlined, we conclude that this MA does not provide sufficient evidence to show a clinical benefit for glucosamine or chondroitin treatment of OA.
| Primary Outcome | Secondary Outcomes | |||||
|---|---|---|---|---|---|---|
| Intervention | 20% decrease in WOMAC pain score, % (n) | p value | OMERACT-OARSI response, % (n) | p value | 50% decrease in WOMAC pain score, % (n) | p value |
| Placebo | 60.1% (188/313) | 56.9% (178/313) | 42.2% (132/313) | |||
| Glucosamine | 64% (203/317) | p=.30 | 60.6% (192/317) | p=.35 | 46.4% (147/317) | p=.29 |
| Chondroitin | 65.4% (208/318) | p=.17 | 63.5% (202/318) | p=.09 | 42.1% (134/318) | p=.99 |
| Glucosamine plus Chondroitin | 66.6% (211/317) | p=.09 | 65.6% (208/317) | p=.02* | 46.4% (147/317) | p=.29 |
| Celecoxib | 70.1% (223/318) | p=.008† | 67.3% (214/318) | p=0.007† | 50% (159/318) | p=0.05* |
p <0.05 for the comparison with placebo
p <0.017 for the comparison with placebo
OMERACT-OARSI: Outcomes Measures in Rheumatology Clinical Trials-Osteoarthritis Research Society; WOMAC: Western Ontario and McMaster Universities;
| Primary Outcome | Secondary Outcomes | |||||
|---|---|---|---|---|---|---|
| Intervention | 20% decrease in WOMAC pain score, % (n) | p value | OMERACT-OARSI response, % (n) | p value | 50% decrease in WOMAC pain score, % (n) | p value |
| Placebo | 61.7% (150/243) | 59.3% (144/243) | 44.9% (109/243) | |||
| Glucosamine | 63.6% (157/247) | p=.67 | 59.1% (146/247) | p=.97 | 47.8% (118/247) | p=.52 |
| Chondroitin | 66.5% (165/248) | p=.27 | 64.9% (161/248) | p=.20 | 44.5% (109/248) | p=.84 |
| Glucosamine plus Chondroitin | 62.9% (154/245) | p=.80 | 62.9% (154/245) | p=.42 | 44.5% (109/245) | p=.94 |
| Celecoxib | 70.3% (173/246) | p=.04* | 67.5% (166/246) | p=.06 | 51.2% (126/246) | p=.16 |
p <.05 for the comparison with placebo
OMERACT-OARSI: Outcomes Measures in Rheumatology Clinical Trials-Osteoarthritis Research Society; WOMAC: Western Ontario and McMaster Universities;
Comment. This is the largest (n=1,583) independently funded RCT of glucosamine and chondroitin that has been reported. It is a good-quality study, with a well-defined, clinically relevant subject sample. The 24-week treatment period is adequate to assess long-term benefit from the supplements. The lack of a significant response to either supplement alone, or the combination, in the context of the significant effect in the celecoxib-treated group, provides compelling evidence that neither glucosamine nor chondroitin provide clinically meaningful pain relief compared to placebo in patients with OA of the knee. A similar pattern of response to glucosamine plus chondroitin was observed for secondary outcomes, in particular the OMERACT-OARSI response rate and the 50 percent decrease in WOMAC pain among all randomized patients. None of the interventions had a significant effect among patients with mild pain.
It has been suggested that failure to demonstrate a statistically significant improvement in the main outcome in GAIT is related to use of glucosamine hydrochloride rather than glucosamine sulfate manufactured by Rotta Research Laboratorium (Hochberg, 2006). It also has been speculated that the positive result with combined therapy in GAIT could be related to co-delivery of sulfate from chondroitin sulfate and glucosamine, but it is unclear if the doses used would be clinically meaningful (Altman, Abramson, Bruyere, et al., 2006). GAIT provides no evidence to address either of those hypotheses.
| Outcome | Change from Baseline (%)* | |
|---|---|---|
| Placebo Group | Treatment Group | |
| WOMAC pain | -6.2 | -11.0 |
| WOMAC stiffness | -4.6 | -7.8 |
| WOMAC function | 5.9 | -0.8 |
| WOMAC total | 2.1 | -3.9 |
| Adverse events | 67 total, none serious | 58 total, none serious |
No significant differences between groups for any score
Comment. This RCT showed no significant difference in WOMAC pain, stiffness, function, or total scores with chondroitin therapy for 24 months versus placebo. It was of adequate design and execution to address the clinical efficacy of the intervention. Patients were generally representative of a typical OAK population. However, the relatively low mean pain score of patients at entry may have limited the ability to detect meaningful improvements.
A total of 120 patients age 40 or over with clinically symptomatic, idiopathic OA of the knee according to ACR criteria were enrolled. Patients with Kellgren-Lawrence grade 1–3 disease and a minimum 25 percent remaining medial femoro-tibial joint space at entry were eligible. Treatment was administered for two periods, the first from entry to month 3 and the second between months 6 and 9; no treatment of any kind was given between months 3–6 and 9–12.
A total of 110 patients (54 chondroitin, 56 placebo) were included in the ITT analysis. Ten patients who did not take any dose of drug or report any data were lost to followup and excluded from the ITT analysis. A total of 43 in the chondroitin and 41 in the placebo group completed the study.
| Outcome | Mean (± SD) Outcome | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Baseline | 3 mos | 6 mos | 9 mos | 12 mos | ||||||
| Pl | CS | Pl | CS | Pl | CS | Pl | CS | Pl | CS | |
| Lequesne Index | 9.1 ± 3.2 | 9.0 ± 2.8 | 7.4 ± 4.2 | 6.8 ± 3.6 | 7.5 ± 4.0 | 6.7 ± 3.5 | 7.0 ± 3.9 | 6.0 ± 3.8 * | 7.0 ± 3.9 | 5.8 ± 3.6** |
| VAS (mm) | 61.1 ± 19.0 | 58.8 ± 15.5 | 49.1 ± 24.5 | 42.9 ± 23.2 | 47.6 ± 26.9 | 40.5 ± 23.9 | 46.1 ± 27.2 | 34.0 ± 26.4F* | 45.8 ± 27.6 | 34.3 ± 27.4* |
p<.05 vs. placebo;
p<.01 (ANOVA between groups)
Comment. These results suggest 9 to 12 months of therapy with chondroitin may reduce pain and improve function in symptomatic OA of the knee. Chondroitin treatment was associated with few minor adverse events and an overall tolerable global assessment. The results are suggestive, but the small size of this trial limits its conclusions and generalizability.
Ninety-three patients (46 G/C, 47 placebo) age 45 to 75 years were enrolled. All had primary OA of the knee with a minimal Lequesne Index score of 7, Kellgren-Lawrence radiographic grade 2 or more, and symptoms of more than 6 months duration. Randomization was stratified by disease severity according to the Kellgren-Lawrence grade. Analysis was planned a priori to be stratified by the Kellgren-Lawrence radiographic grade of OA, with the mild/moderate (2–3) group as the primary study population. Thus, of the 46 patients randomized to the intervention, 33 had Kellgren-Lawrence grade 2–3 OA and 13 had Kellgren-Lawrence grade 4 OA. The placebo group had 39 patients with Kellgren-Lawrence grade 2–3 OA and 8 with Kellgren-Lawrence grade 4 OA. The primary outcome measure was defined as a 25 percent improvement in the Lequesne Index, with the total WOMAC score as a secondary outcome. The patient's global assessment of improvement also was recorded.
| Outcome | Time | Mild/moderate cases | Severe cases | ||
|---|---|---|---|---|---|
| (mos) | Mn (± SEM) | Mn (± SEM) | |||
| Pl (n=39) | GH/CS (n=33) | Pl (n=8) | GH/CS (n=13) | ||
| Lequesne Index | Baseline | 10.4 (0.4) | 10.2 (0.4) | 10.7 (1.2) | 11.1 (0.80 |
| 2 | 9.6 (0.5) | 8.9 (0.5) | 10.1 (1.4) | 10.2 (0.8) | |
| 4 | 9.2 (0.6) | 7.2 (0.6) * | 9.6 (1.5) | 9.4 (0.9) | |
| 6 | 9.0 (0.6) | 7.4 (0.6)† | 9.9 (1.6) | 9.6 (1.0) | |
| ≥ 25% improvement | 11 (28%) | 15 (52%)† | 2 (25) | 3 (23) | |
| WOMAC total | Baseline | 944 (55) | 908 (71) | 1089 (158) | 1187 (119) |
| 2 | 831 (64) | 768 (71) | 984 (166) | 1134 (121) | |
| 4 | 774 (79) | 655 (72) | 900 (174) | 1041 (126) | |
| 6 | 724 (87) | 626 (77) | 882 (183) | 1033 (126) | |
| ≥ 25% improvement | 16 (41%) | 19 (58%) | 2 (25%) | 4 (31%) | |
p=.003;
p=.04 vs. placebo
| Outcome | Placebo (n=104) | Acetaminophen (n=108) | GS (n=106) | |||
|---|---|---|---|---|---|---|
| Baseline | 6 mos | Baseline | 6 mos | Baseline | 6 mos | |
| Lequesne Index (points)* | 10.8 | -1.9 | 11.1 | -2.7 | 11.0 | -3.1† |
| (2.6) | (-2.6, -1.2) | (2.7) | (-3.3, -2.1) | (3.1) | (-3.8, -2.3) | |
| WOMAC (points)* | 37.9 | -8.2 | 40.4 | -12.3 | 38.3 | -12.9‡ |
| (14.3) | (-11.3, -5.1) | (14.8) | (-14.9, -9.7) | (15.2) | (-15.6, -10.1) | |
| OARSI-A responders (%) | 21.2 | 33.3§ | 39.6** | |||
Mean absolute (SD) at baseline and change (95% CI) at 6 mos
p=.032 vs. placebo [difference = -1.2 (-2.3, -0.8);
p=.039 vs. placebo [difference = -4.7 (-9.1, -0.2);
p=.047 vs. placebo;
p=.007 vs. placebo
Seventy percent of treatment recipients with mild-to-moderate OA of the knee reported more than 25 percent improvement in their global assessment compared with 46 percent of those given placebo (p=.04). In those with severe OA of the knee, the intervention had no impact on the global assessment response rate compared to placebo (31 percent versus 38 percent). There was a 17 percent incidence of adverse events in treatment recipients, primarily attributed to the GI tract, compared with 19 percent in the placebo group (NSD). Four patients dropped out, but all who had a baseline visit and received their medications were included in the ITT analysis.
Comment. This study was generally well-designed and -conducted. However, its conclusions are limited by the small number of patients. The study sample may be self-selected due to recruitment through newspaper advertisements, and perhaps not typical of a generalized OA of the knee population. The small numbers involved in patients with severe knee OA are insufficient to conclude that glucosamine and chondroitin treatment has a differential response in mild-to-moderate versus severe disease.
Herrero-Beaumont, Roman, Trabado, et al., 2007 (GUIDE). The “Glucosamine Unum in Die Efficacy” (GUIDE) trial is a multicenter, placebo-controlled RCT performed in Europe using Rotta glucosamine sulfate. A total of 318 patients (88 percent female) with OA of the knee (ACR criteria) were randomly allocated to glucosamine 1,500 mg daily, acetaminophen 1000 mg three times daily, or a placebo using a double-dummy design. Rescue medication consisted of ibuprofen as needed. The primary efficacy measure was the 6-month change in the Lequesne Index in the ITT population, using the “last observation carried forward” approach for patients who did not complete the study (34 on placebo, 28 each in the glucosamine sulfate and acetaminophen groups). Secondary measures included the total WOMAC score and OARSI-A responder criteria.
Comment. This RCT suggests glucosamine is efficacious in relieving mild-to-moderate pain of knee OA. However, it is not directly comparable to GAIT for several reasons. First, it uses a more sensitive, less rigorous primary outcome measures (OARSI-A) than the 20 percent reduction in WOMAC pain used in GAIT. Second, NSAIDs are considered modestly superior to acetaminophen for general or rest pain. For pain on motion and overall assessment of clinical response, NSAIDs also appear modestly superior, though differences are not always statistically significant. Only comparisons to placebo are reported, with no comparisons between the active arm and glucosamine. Finally, the use of glucosamine sulfate available only in Europe, and sponsorship by the manufacturer (Rotta) limit generalizability. Thus, while GUIDE provides evidence for glucosamine efficacy, its results are insufficient to establish this or to override the results of GAIT. It does provide a rationale for further independent study of glucosamine sulfate.
| Study | N Tx/Pl | Duration (wks) | Outcome | Baseline Tx/Pl** (rng or 95% CI) | End Tx/Pl** (rng or 95%CI) | Δ Mean (95% CI, p value) | % Responders Tx/Pl (p value) | USPSTF Quality | Comment |
|---|---|---|---|---|---|---|---|---|---|
| Herrero-Beaumont et al., 2007 | 106/104 | 24 | Lequesne Index | 11.0 ± 3.1 | 7.9 (calc) | -1.2 (calc) | 39.6 vs. 21.2 | Fair | Used acetaminophen as active control, NSD between active and GS group |
| (GUIDE) | WOMAC index | 10.8 ± 2.6 | 8.9 (calc) | (.032) | OARSI-A | ||||
| 38.3 ± 15.2 | 25.4 (calc) | -4.7 | (.007) | ||||||
| 37.9 ± 14.3 | 29.7 (calc) | (-9.1, -0.2) | |||||||
| (0.39) | |||||||||
| Pavelka et al., 2002 | 101/101 | 156 | Lequesne Index | 8.9 ± 2.3 | 7.2 (NR) | -0.91 | NR | Good | Primarily examined structural changes in mild-to-moderate OAK; WOMAC pain change -10.6% |
| WOMAC pain | 8.9 ± 2.3 | 8.1 (NR) | (-0.34, 1.5) | NR | |||||
| 6.6 ± 3.4 | NR | (.002) | |||||||
| 6.3 ± 3.1 | -0.7 | ||||||||
| (-0.06, -1.3) | |||||||||
| (.03) | |||||||||
| Reginster et al., 2001 | 106/106 | 156 | WOMAC pain | 194 ± 102 | 156 (NR) | -30 (estimated) | NR | Good | Structural changes in mild-to-moderate OAK. WOMAC pain change |
| 172 ± 104 | 164 (NR) | -19.5% in GS pts, -5% in placebo (net -15%) | |||||||
| Rovati et al., 1997 | 329 total | 12 | Lequesne Index | 10.5 (estimated) | 5.6 (estimated) | -3.6 (estimated) | NR | Unrated | Patients with mild-to-moderate OAK showed -35% change in LI |
| 10.1 | 8.8 (estimated) | (abstract) | |||||||
| Noack et al., 1994 | 126/126 | 4 | Lequesne Index | 10.6 ± 0.4 | 7.4 ± 0.5 | -1.0 | 52/37 | Fair | Moderate-to-severe OAK*; net difference about -9% with treatment |
| (4–22) | (0–21) | (.05) | (.016) | ||||||
| 10.6 ± 0.4 | 8.4 ± 0.4 | ||||||||
| (4–20) | (0–24) | ||||||||
| Pujalte et al., 1980 | 10/10 | 8 | Composite measure of pain, tenderness, swelling, stiffness on 1–4 point scale in order of increasing severity | 2.3 ± 0.15 | 1.2 ± 0.08 | -0.81 | 80 vs. 20 | Poor | Patients with mild-to-moderate OA of the knee; used unvalidated composite measure of efficacy |
| 2.6 ± 0.31 | 2.3 ± 0.25 | (pain) | |||||||
| (.0004) | |||||||||
ITT analysis, based on minimum 3-pt drop in Lequesne Index in the presence of an overall judgment of efficacy by the investigator rated “good” or “moderate”
Mn ± SD or SEM;
NSD: non-significant difference; USPSTF: U.S. Preventive Services Task Force
| Study | Summary Tx/Pl (p-value) | CV No. Tx/Pl | Local Skin No. Tx/Pl | Headache No. Tx/Pl | MS No. Tx/Pl | GI Tract No. Tx/Pl | Nervous System No. Tx/Pl | Respiratory Tract No. Tx/Pl | Urinary Tract No. Tx/Pl | General Body No. Tx/Pl | Misc No. Tx/Pl |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Herrero-Beaumont et al., 2007 | Number of adverse events in each group were similar: 89 with Pl, 96 with acetaminophen, 95 with GS, most of minor clinical significance | 0/1 | NR | 2/4 | 10/5 | 11/16 | 3/5 | 9/9 | NR | NR | 4/2 |
| (gastroenteritis) | |||||||||||
| Clegg et al., 2006 | 77 total in 66 pts none serious, not separated by agent, described as generally mild (NSD) | NR | NR | NR | NR | NR | NR | NR | NR | NR | NR |
| McAlindon et al., 2004 | 18/14 (NSD) | NR | NR | NR | 7/2 | 4/6 | 2/2 | NR | NR | 1/1 | 4/3 |
| Usha and Naidu, 2004 | Totals NR, none serious enough to discontinue therapy, described as well tolerated (NSD) | NR | NR | NR | NR | > 5% pts reported diarrhea, grp not specified | NR | NR | NR | NR | NR |
| Hughes and Carr, 2004 | No serious events reported (NSD) | NR | 0/1 | 4/6 | 9/9p | 4/4 | 1/0 | NR | 1/0 | NR | 4/8 (cold/flu) |
| Pavelka et al., 2002 | 138/123 total in 202 pts, 8/10 withdrew (NSD) | 23/20 | 10/15 | NR | 30/22 | 25/28 | NR | 17/7 | 12/11 | 7/6 | 14/14 |
| Reginster et al., 2001 | 83/101 total in 212 pts, 21/18 withdrew (NSD) | 21/30 | 4/7 | 6/4 | NR | 27/37 | 11/20 | NR | NR | 10/7 | NR |
| Das and Hammad, 2000 | 9/8 pts reported at least one adverse event, none judged serious (NSD) | NR | NR | NR | 0/1 | 7/10 | NR | NR | NR | 1/0 | 3/4 |
| Rindone et al., 2000 | No serious adverse events reported, 17/11 pts reported at least one event, 2/4 pts withdrew (NSD) | X | X | X | NR | X | X | NR | NR | X | NR |
| (no. NR) | (no. NR) | (no. NR) | (no. NR) | (no. NR) | (no. NR) | ||||||
| Houpt et al., 1999 | 12% of pts in both grps reported mild adverse events (NSD) | NR | NR | NR | NR | X | NR | NR | NR | NR | NR |
| (no. NR) | |||||||||||
| Rovati, 1997 | 14.8%/23.7% of pts reported an adverse event (NSD) | NR | NR | NR | NR | NR | NR | NR | NR | NR | NR |
| Noack et al., 1994 | No serious adverse events reported, 8/13 pts reported at least one event, 10/16 pts withdrew (NSD) | 0/2 | 1/3 | 2/2 | NR | 5/6 | NR | NR | NR | NR | NR |
| Pujalte et al., 1980 | No serious adverse events reported, none withdrew, described as well tolerated | NR | NR | NR | NR | NR | 0/1 | NR | NR | NR | NR |
CV = cardiovascular; MS = musculoskeletal; NSD = no significant difference; NR = not reported; Pl = placebo; Tx = treatment;
| Study | Summary Tx/Pl (p value) | CV No. Tx/Pl | Local Skin No. Tx/Pl | Headache No. Tx/Pl | MS No. Tx/Pl | GI Tract No. Tx/Pl | Nervous System No. Tx/Pl | Respiratory Tract No. Tx/Pl | Urinary Tract No. Tx/Pl | General Body No. Tx/Pl | Misc No. Tx/Pl |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Clegg et al., 2006 | 77 total in 66 pts none serious, not separated by agent, described as generally mild (NSD) | NR | NR | NR | NR | NR | NR | NR | NR | NR | NR |
| Michel et al., 2005 | 87/101 pts reported an adverse event, 9/9 withdrew, but only 2 events judged related to Tx (NSD) | 9/8 | 9/9 | 11/14 | NR | 6/17 | NR | 44/46 | 8/7 | NR | NR |
| Uebelhart et al., 2004 | Minor adverse events only, 1/1 withdrew (NSD) | NR | NR | NR | NR | 4/6 | NR | NR | NR | NR | NR |
| Mazieres et al., 2001 | 28/21 pts reported at least one adverse event, 4/3 withdrew, none were judged related to Tx (NSD) | NR | NR | NR | NR | Tx > Pl | NR | Tx > Pl | NR | NR | NR |
| (p=.04) | (p=.05) | ||||||||||
| Das and Hammad, 2000 | 9/8 pts reported at least one adverse event, none judged serious (NSD) | NR | NR | NR | 0/1 | 7/10 | NR | NR | NR | 1/0 | 3/4 |
| Bourgeois et al., 1998 | 16/12 adverse events reported, none serious, 3/3 withdrew, Tx described as well tolerated (NSD) | 1/0 | 2/2 | NR | NR | 11/10 | NR | NR | NR | NR | 2/0 |
| Bucsi and Poor, 1998 | No serious adverse events reported, tolerance of Tx reported as excellent (NSD) | NR | NR | NR | NR | 0/1 | NR | NR | NR | NR | NR |
| Conrozier, 1998 | Tolerance reported as excellent in 90% of Tx pts, 2 (not specified) withdrew (NSD) | NR | NR | NR | NR | NR | NR | NR | NR | NR | NR |
| Uebelhart et al., 1998 | Tolerance reported as good in both grps (NSD) | NR | NR | NR | NR | NR | NR | NR | NR | NR | NR |
| L'Hirondel, 1992 | No serious adverse events reported (NSD) | NR | NR | NR | NR | 7/13 | NR | NR | NR | NR | NR |
CV = cardiovascular; MS = musculoskeletal; NSD = no significant difference; NR = not reported; Pl = placebo; Tx = treatment;
Glucose Metabolism. There has been speculation that because glucosamine is taken up by cells and metabolized through the same pathways as glucose, it could have an effect on glycemic control in humans (Hathcock and Shao, 2006; Matheson and Perry, 2003). Data from 11 in vitro studies showed that increasing concentrations of glucosamine altered glucose transport, glycogen synthesis, and insulin response to glucose (Institute of Medicine and National Research Council, 2004; Anderson, Nicolosi, Borzelleca, et al., 2005). However, the clinical relevance these findings is unclear because they were obtained in isolated and cultured cell models using glucosamine concentrations 200 to 500 times the serum concentration expected with normal oral doses in humans.
Glucosamine increases flux through the hexosamine pathway, which leads to deterioration of pancreatic beta cell function, thus possibly enhancing the risk of diabetes (Kaneto, Xu, Song, et al., 2001; Yoshikawa, Tajiri, Sako, et al., 2002). However, in two acute metabolic ward studies, large amounts of glucosamine (7.2 g or 9.7 g of free base) were infused over 5 hours with no change in insulin activity or glucose metabolism (Monauni, Zenti, Cretti, et al., 2000; Pouwels, Jacobs, Span, et al., 2001).
Specific effects of glucosamine on glycemic control have been studied. One double-blind, randomized, placebo-controlled trial compared the effect of oral glucosamine sulfate 1,500 mg daily with placebo (dextrose) for 12 weeks on serum insulin levels and glucose tolerance in healthy adults (Tannis, Barban, and Conquer, 2004). No baseline differences were observed in fasted levels of serum insulin or blood glucose in glucosamine sulfate recipients compared with those given placebo. Three-hour oral glucose tolerance tests showed glucosamine did not alter those parameters, with no significant differences within or between treatments, ages, or gender. Negative results in this study were limited by the small number of subjects (n=19), short duration, and large variability in the data. Moreover, blood levels of insulin and glucose represent surrogate markers for insulin sensitivity, not a gold standard for measuring it.
A second randomized, double-blind, placebo-controlled trial (n=38) examined the effect of daily administration of glucosamine 1,500 mg plus chondroitin sulfate 1,200 mg for 90 days on glycemic control in patients with well-controlled, type 2 diabetes mellitus (Scroggie, Albright, Harris, et al., 2003). As reflected by hemoglobin A1c concentrations, glycemic control was equivalent in the intervention and placebo arms, with no difference from baseline in either group. These results suggest glucosamine has no effect on glycemic control in patients with type 2 diabetes. Because the trial lasted only 90 days, it is not possible to extrapolate its results beyond that time or to less well-controlled patients.
A third double-blind, placebo-controlled trial examined the effect of oral glucosamine 500 mg thrice daily on insulin sensitivity or endothelial dysfunction in lean (n=20) and obese (n=20) subjects aged 22 to 65 years (Muniyappa, Karne, Hall, et al., 2006). Glucosamine or placebo treatment for 6 weeks was followed by a 1-week washout and crossover to the other study arm. The subjects in this study had expected clinical and biochemical characteristics. The lean subjects had normal metabolic and hemodynamic parameters while obese subjects exhibited typical insulin resistance and impaired insulin-stimulated brachial artery blood flow. Neither glucosamine nor placebo caused insulin resistance in healthy lean subjects or worsened this parameter in obese subjects. No significant changes were observed in either lean or obese subjects in any other measured parameters related to insulin sensitivity including lipid profiles, blood pressure, or hemoglobin A1c levels. Neither glucosamine nor placebo had an effect on endothelial dysfunction in either subject group. Thus, 6 weeks of oral glucosamine treatment at usual dose appears to have no deleterious effect on glucose metabolism or vascular function.
Two long-term placebo-controlled RCTs of glucosamine sulfate 1,500 mg daily for 3 years in OA of the knee reported findings on glucose metabolism. During one trial (total n=202) in which diabetic patients were excluded, four developed diabetes mellitus, 3 in the placebo group and one in the glucosamine group (Pavelka, Gatterova, Olejarova, et al., 2002). Although no quantitative data were provided, the authors reported routine safety laboratory test results did not show significant differences between groups. The second RCT (n =212) excluded individuals with substantial abnormalities in hematological, hepatic, renal, or metabolic functions, which could include diabetes (Reginster, Deroisy, Rovati, et al., 2001). No change was reported in glycemic homeostasis, with fasting plasma glucose concentrations slightly lower in the glucosamine group compared to placebo. Taken together, these results show long-term ingestion of glucosamine sulfate at a dose commonly used in OA of the knee has no impact on glucose metabolism in healthy patients. They do not, however, provide information relevant to diabetic patients.
A systematic review of 16 clinical studies, including 854 patients treated with glucosamine for a weighted average of 37 weeks (range 3–156 weeks), found no evidence that glucosamine ingestion is associated with significant changes in blood glucose levels (Anderson, Nicolosi, Borzelleca, et al., 2005). A second systematic review including virtually the same studies came to the same conclusion (Stumpf and Lin, 2006). The authors of that review suggest that because data on glucosamine use in patients with diabetes mellitus are limited, such patients should be closely monitored for possible changes in glucose control.
In sum, available laboratory studies are short-term, whereas longer (3 years) OA efficacy trials excluded patients with metabolic disorders. Many OA RCTs presented incomplete information about adverse events, and most did not evaluate blood chemistries systematically. Therefore, no conclusions concerning metabolic effects of chronic glucosamine use in the general population can be drawn.
Our systematic review identified two RCTs that stratified patients according to OA severity (Clegg, Reda, Harris, et al., 2006; Das and Hammad, 2000). Given the small number of cases (n=8 treatment, 13 placebo) in the severe disease category presented by Das and Hammad, we do not consider their results further. We did not identify any studies that performed subgroup analyses by age, sex, race, weight, OA diagnosis, or symptom duration.
| Primary Outcome | Secondary Outcomes | |||||
|---|---|---|---|---|---|---|
| Intervention | 20% decrease in WOMAC pain score, % (n) | p value | OMERACT-OARSI response, % (n) | p value | 50% decrease in WOMAC pain score, % (n) | p value |
| Placebo | 54.3% (38/70) | 48.6% (34/70) | 32.9% (23/70) | |||
| Glucosamine | 65.7% (46/70) | p=.17 | 65.7% (46/70) | p=.04 | 41.4% (29/70) | p=.29 |
| Chondroitin | 61.4% (43/70) | p=.39 | 58.6% (41/70) | p=.24 | 35.7% (25/70) | p=.72 |
| Glucosamine plus Chondroitin | 79.2% (57/72) | p=.002** | 75% (54/72) | p=.001† | 52.8% (38/72) | p=.02* |
| Celecoxib | 69.4% (50/72) | p=.06 | 66.7% (48/72) | p=.03* | 45.8% (33/72) | p=.11 |
p<.05 for the comparison with placebo
p<.017 for the comparison with placebo
WOMAC = Western Ontario and McMaster Universities; OMERACT-OARSI = Outcomes Measures in Rheumatology Clinical Trials-Osteoarthritis Research Society
A clinically meaningful, statistically significant effect was observed in the primary outcome and one secondary measure (OMERACT-OARSI response rate) in patients who received glucosamine plus chondroitin compared to placebo. In the celecoxib arm the response rate for the primary outcome was not statistically different from that in the placebo arm. It did show a clinically meaningful treatment effect, defined by the investigators as an absolute increase in the response rate of 15 percent. A similar pattern occurred using the OMERACT-OARSI outcome criteria. No statistically significant differences were seen when outcomes were assessed as a 50 percent decrease in WOMAC pain.
Comment. The benefit of combined treatment in patients with moderate-to-severe OA of the knee requires reconciling effect magnitudes and their consistency with statistical results in the glucosamine chondroitin and celecoxib arms. Results reported for combined therapy were consistent in direction, and of sufficient magnitude to reach statistical significance, based on the primary outcome (20 percent decrease in WOMAC pain score) or the secondary outcome (OMERACT-OARSI response rate). The direction and magnitude of effect in the celecoxib controls are consistent with clinical benefit, whether scored according to the primary outcome or the OMERACT-OARSI response criteria. The failure of the primary outcome to reach statistical significance in this arm may be explained by insufficient study power due to the relatively small numbers of patients. Overall, the GAIT subgroup data suggest, but do not prove, combination glucosamine chondroitin therapy provides clinically meaningful improvement in patients with moderate-to-severe pain of OA of the knee.
In summary, we sought prospective subgroup analyses from RCTs. No analyses, other than described above, were found.
In our systematic review, we did not find any direct comparative studies in which glucosamine, chondroitin, or glucosamine plus chondroitin were compared with arthroscopy or viscosupplementation to treat OA of the knee. Therefore, no conclusions can be drawn concerning comparative efficacy.
What are the Clinical Effectiveness and Harms of Enteral Glucosamine and Chondroitin Given Alone or in Combination, in Patients With Primary OA of the Knee?
The best available evidence found that glucosamine hydrochloride, chondroitin sulfate, or their combination provide no clinical benefit in patients with primary OA of the knee.
The best evidence comes from the Glucosamine/Chondroitin Arthritis Intervention Trial (GAIT; Clegg, Reda, Harris, et al., 2006), a large (n=1,583), good quality, NIH-funded, multicenter RCT. GAIT compared glucosamine hydrochloride, chondroitin sulfate, or the combination of these agents, with placebo or celecoxib in patients with primary osteoarthritis of the knee. After 24 weeks of treatment, ITT analysis showed no significant difference in symptomatic relief between glucosamine hydrochloride, chondroitin sulfate, or glucosamine hydrochloride plus chondroitin sulfate compared to placebo. Substantiating this result was that celecoxib, the active control, was effective.
Five of six MAs concluded that glucosamine or chondroitin were superior to placebo. However, the MA results do not outweigh the GAIT results due to lower quality of the primary literature and small differences reported.
Six study-level MAs assessed glucosamine or chondroitin in OA of the knee. All but one of the MAs reported statistically significant differences between treatment and placebo. However, these MAs had limitations in the quality of the primary studies that were pooled. Limitations of the primary literature included small study size, inclusion of studies that assessed joints other than knee, and failure to report intent to treat analysis. In general, the MAs did not perform adequate quality appraisal of the primary studies.
Glucosamine sulfate has been reported to be more effective than glucosamine hydrochloride, but the evidence is insufficient to draw conclusions.
A subgroup analysis in the largest MA (Towheed, Maxwell, Anastassiades, et al., 2006) showed a statistically significant pooled effect from 7 RCTs favoring glucosamine sulfate in studies that involved Rotta Research Laboratorium, in contrast to no effect for 8 non-Rotta RCTs. Because the pooled estimate for the Rotta studies was accompanied by substantial heterogeneity secondary to elements of study design and analysis, patient samples, and routes of administration, there is a considerable degree of uncertainty in that result. The results of GUIDE (Herrero-Beaumont, Roman, Trabado, et al., 2007), a European placebo-controlled RCT (n=318), also sponsored by Rotta, seemingly support the effectiveness of glucosamine sulfate. To date, no independent studies of the Rotta glucosamine sulfate formulation have been conducted. While the overall results of GAIT show no benefit, in the subgroup of knee OA patients with moderate-to-severe pain at baseline, the combination of glucosamine hydrochloride and chondroitin sulfate significantly improved pain. Together, this evidence suggests an independent trial of glucosamine sulfate would be useful to definitively establish whether there is benefit.
In general, adverse events with glucosamine or chondroitin treatment were no greater than placebo. No conclusions concerning metabolic effects of chronic glucosamine use in the general population can be drawn.
Adverse events reported in the literature included nausea, diarrhea, headache, musculoskeletal complaints, and others. There were no significant differences between placebo and treatment. There has been some concern from in vitro and preclinical studies that glucosamine supplementation could have a deleterious effect on glucose metabolism and glycemic control. However, available clinical studies are short-term, or if longer (3 years) excluded patients with metabolic disorders.
What are the Clinical Effectiveness and Harms of the Interventions of Interest in Patients With Secondary OA of the Knee?
We identified no studies that enrolled patients with only secondary OA of the knee, or that reported separately on secondary OA of the knee. Therefore, no conclusions can be drawn about treatment outcomes in patients with secondary OA of the knee.
How do the Short-Term and Long-Term Outcomes of the Interventions of Interest Differ by the Following Subpopulations: Age, Race/Ethnicity, Gender, Primary or Secondary OA, Disease Severity and Duration, Weight (Body Mass Index), and Prior Treatments?
GAIT found that glucosamine plus chondroitin produced a statistically and clinically significant improvement of pain in patients with moderate-to-severe pain from OA of the knee at baseline. Although the effect of celecoxib treatment in a similar group of patients was not statistically significant, the magnitude and direction of the response were consistent with clinical benefit. The nonsignificant statistical result in the celecoxib arm may be a function of insufficient power due to the small number of patients. Although this subgroup analysis was not explicitly prespecified in the GAIT protocol, the stratified randomization by disease severity yields statistically valid comparisons. A trial of glucosamine sulfate would be useful to definitively establish whether there is benefit
How do the Short-Term and Long-Term Outcomes of the Interventions of Interest Compare for the Treatment of Primary OA of the Knee; and Secondary OA of the Knee?
We did not find any direct comparative studies in which glucosamine, chondroitin, or glucosamine plus chondroitin were compared with arthroscopy or viscosupplementation to treat OA of the knee. Therefore, no conclusions can be drawn concerning comparative efficacy.
The effectiveness of arthroscopic lavage and debridement can be evaluated using several study designs. Placebo-controlled randomized, controlled trials (RCTs) could address whether arthroscopic lavage and debridement achieve results surpassing placebo. Placebo-controlled RCTs for surgical procedures can be especially difficult to execute because investigators may have ethical concerns about sham procedures and patients may be reluctant to participate. RCTs comparing an intervention with an active control treatment may receive greater acceptance by clinicians and patients. The key strength of RCTs generally concerns control for confounding and several sources of bias. Well-conducted subgroup analyses from RCTs can reveal whether the effects of an intervention differ according to particular patient characteristics. Quasi-experimental designs are controlled studies that do not assign patients randomly and are more susceptible to confounding.
Uncontrolled studies, such as administrative database analyses and case series provide weaker evidence. Administrative databases can give a broader view of outcomes of interventions in everyday practice, compared to the tightly controlled conditions of an RCT. However, administrative database analyses can be flawed by poor data quality and unmeasured variables. Case series are a weak design for evaluating effectiveness due to lack of comparison groups and failure to control for placebo effects. Despite weaknesses, evidence from uncontrolled studies can support inferences about effectiveness, particularly when studies use high quality methods and the effects are large enough to exceed potential biases and nonspecific effects. Studies of different designs were sought to examine whether outcomes differed by subgroups, particularly primary versus secondary osteoarthritis (OA) of the knee and those with mechanical versus loading symptoms. This review of arthroscopic lavage and debridement will address evidence from different study designs in turn.
| Study | Inclusion | Exclusion | n, Enrolled | n, Withdrawn | n, Outcome Evaluated |
|---|---|---|---|---|---|
| Moseley et al., 2002 | 10/95 – 9/98; pts recruited from Houston VAMC; ≤ 75 yo; OA of knee by ACR definition; at least moderate pain (VAS ≥ 4) despite maximal medical treatment for ≥ 6 mo; no arthroscopy in previous 2 yrs; study knee was that with greatest pain-induced limitation of function; randomization to 1 of 3 groups (debridement-D, lavage-L, placebo-P) stratified by 3 levels of severity of OA; used sealed, sequentially numbered envelopes handed to surgeon in operating suite, treatment assignment not revealed to patient; randomization stratified within 3 OA severity grades (1–3, 4–6, 7–8) | Severity grade ≥9/12; severe deformity; serious medical problems | Of 324 consecutive pts who met inclusion criteria,144 (44%) declined to participate (participants were significantly younger, more likely to be white and had more severe OA). | 2 yrs: | 2 yrs: |
| Hypothesis: pts in the L and D groups would have same amount of knee pain at 2 yrs as P pts | n=180 | L: 6 | L: 55 | ||
| L: 61 | D: 6 | D: 53 | |||
| D: 59 | P: 5 | P: 55 | |||
| P: 60 | |||||
| Trial designed to have 90% power to detect 0.55 effect size between P and L+D on SF-36-P at 2 yrs, n=180 and ≤ 16 pts lost to F/U | |||||
| Study | Intervention | Prior Treatments | Concurrent Treatments |
|---|---|---|---|
| Moseley et al., 2002 | One surgeon performed all procedures; D and L pts received general anesthesia; P pts received IV tranquilizer and opioid and spontaneously breathed oxygen-enriched air; L pts were irrigated with 10 L of fluid, anything that could be flushed through cannulas was removed, debridement among L pts only performed to resect portion of mechanically important unstable tears of the meniscus; D pts received lavage, rough articular cartilage was shaved, loose debris removed, all torn or degenerated meniscal fragments trimmed, remaining meniscus smoothed to a firm and stable rim, no abrasion arthroplasty or microfracture, bone spurs typically not removed except spurs from tibial spine area; P pts received 3 1-cm incisions in the skin, surgeon asked for all instruments and manipulated the knee as if arthroscopy was being performed; saline was splashed to simulate sound of lavage, no instruments entered portals, P pts kept in operating room for amount of time required for debridement, P pts spent night in hospital cared for by nurses unaware of group assignment | Maximal medical treatment for ≥6 mo | Postop all pts received the same walking aids, graduated exercise program, and analgesics |
| Study | Age | Percent Female | Race (%) | Preoperative Disease Severity (%) | Pain | Function | Other Characteristics |
|---|---|---|---|---|---|---|---|
| Moseley et al., 2002 | L: mn 51.2, sd 10.5 | L: 12 | W/B/O | Mild/mod/sev | Mn KSPS pain | Mn KSPS function | Analgesic use (OTC/Rx) |
| D: mn 53.6, sd 12.2 | D: 3 | L: 59/31/10 | L: 28/46/26 | L: 50.2 | L: 62.4 | L: 67/21 | |
| P: mn 52.0, sd 11.1 | P: 7 | D: 61/22/17 | D: 31/46/24 | D: 51.4 | D: 57.6 | D: 64/15 | |
| P: 60/32/8 | P: 28/47/25 | P: 49.4 | P: 62.2 | P: 70/22 |
KSPS: Knee-Specific Pain Scale; mn: mean; OTC: over the counter; sd: standard deviation
| Study | Outcomes Assessed | Response Criteria | Observer | F/U |
|---|---|---|---|---|
| Moseley et al., 2002 | Primary: 24 mo Knee-Specific Pain Scale (KSPS) created for the study (0–100); Secondary: pain subscale of Arthritis Impact Measurement Scales (AIMS2-P); pain subscale of SF-36(-P); walking-bending subscale of AIMS2-P(-WB); physical subscale of SF-36(-PF); investigator-devised Physical Functioning Scale (PFS, time to walk 30 m and climb up and down flight of stairs as quickly as possible); all measures transformed to 0–100 scale; guess which procedure was performed | Results viewed with respect to minimal important difference (MID) using stratified central tendency approach against change rating external criterion level described as somewhat better (or worse) and much better (or worse), and standard error of measurement-based method. | Study personnel unaware of group assignment, operating surgeon did not participate in any way | 2 wk, 6 wk, 3 mo, 6 mo, 12 mo, 24 mo |
| Study | Initial Assembly of Comparable Groups | Low Loss to Followup, Maintenance of Comparable Groups | Measurements Reliable, Valid, Equal* | Interventions Comparable/Clearly Defined | Appropriate Analysis of Results | Overall Rating |
|---|---|---|---|---|---|---|
| Moseley et al., 2002 | Y | Y | Y | Y | Y | Good |
| Study | Outcome | F/U | Group | n | mn (sd) | p value (vs. placebo) | Outcome | F/U | Group | n | mn (sd) | p value (vs. placebo) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Moseley et al., 2002 | KSPS-Pain | 6 mo | L | 59 | 53.2 (22.6) | 0.17 | PFS | 6 mo | L | 52 | 49.4 (20.4) | 0.47 |
| D | 56 | 50.0 (21.0) | 0.55 | D | 54 | 49.8 (17.4) | 0.34 | |||||
| P | 57 | 47.6 (20.7) | P | 54 | 47.0 (13.0) | |||||||
| 1 yr | L | 57 | 54.8 (19.8) | 0.14 | 1 yr | L | 54 | 50.4 (17.6) | 0.09 | |||
| D | 50 | 51.7 (22.4) | 0.51 | D | 47 | 52.5 (20.3) | 0.04* | |||||
| P | 53 | 48.9 (21.9) | P | 49 | 45.6 (10.2) | |||||||
| 18 mo | L | 56 | 51.1 (22.7) | 0.78 | 18 mo | L | 49 | 51.2 (18.8) | 0.41 | |||
| D | 51 | 50.7 (25.3) | 0.73 | D | 44 | 52.8 (20.9) | 0.23 | |||||
| P | 52 | 52.4 (22.4) | P | 46 | 48.5 (12.4) | |||||||
| 2 yr | L | 55 | 53.7 (23.7) | 0.64 | 2 yr | L | 50 | 53.2 (21.6) | 0.13 | |||
| D | 53 | 51.4 (23.2) | 0.96 | D | 44 | 52.6 (16.4) | 0.11 | |||||
| P | 55 | 51.6 (23.7) | P | 44 | 47.7 (12.0) | |||||||
| AIMS2-WB | 6 mo | L | 59 | 48.7 (31.6) | 0.94 | SF-36-P | 6 mo | L | 59 | 46.0 (22.0) | 0.95 | |
| D | 55 | 52.5 (28.7) | 0.51 | D | 55 | 45.1 (20.6) | 0.80 | |||||
| P | 57 | 49.1 (25.8) | P | 57 | 46.3 (26.4) | |||||||
| 1 yr | L | 57 | 49.6 (29.1) | 0.98 | 1 yr | L | 57 | 42.8 (21.2) | 0.86 | |||
| D | 51 | 56.4 (28.4) | 0.19 | D | 51 | 44.5 (24.3) | 0.84 | |||||
| P | 54 | 49.4 (25.5) | P | 54 | 43.6 (24.8) | |||||||
| 18 mo | L | 57 | 50.5 (28.5) | 0.34 | 18 mo | L | 57 | 44.4 (24.9) | 0.45 | |||
| D | 51 | 53.1 (29.3) | 0.66 | D | 51 | 46.8 (22.8) | 0.20 | |||||
| P | 52 | 55.6 (26.6) | P | 52 | 40.8 (24.9) | |||||||
| 2 yr | L | 56 | 51.1 (28.3) | 0.61 | 2 yr | L | 57 | 44.4 (22.4) | 0.63 | |||
| D | 53 | 56.4 (29.4) | 0.64 | D | 52 | 45.0 (23.0) | 0.56 | |||||
| P | 55 | 53.8 (27.5) | P | 55 | 42.3 (24.2) | |||||||
| AIMS2-P | 6 mo | L | 59 | 54.8 (21.6) | 0.23 | SF-36-PF | 6 mo | L | 59 | 53.4 (27.6) | 0.32 | |
| D | 55 | 52.2 (20.8) | 0.60 | D | 55 | 51.0 (25.9) | 0.60 | |||||
| P | 57 | 50.0 (20.7) | P | 57 | 48.4 (25.9) | |||||||
| 1 yr | L | 57 | 57.8 (23.5) | 0.34 | 1 yr | L | 57 | 50.0 (28.0) | 0.90 | |||
| D | 51 | 53.3 (25.4) | 0.95 | D | 50 | 47.3 (27.1) | 0.69 | |||||
| P | 54 | 53.6 (22.1) | P | 54 | 49.3 (24.5) | |||||||
| 18 mo | L | 57 | 55.4 (24.6) | 0.95 | 18 mo | L | 57 | 47.0 (28.8) | 0.68 | |||
| D | 51 | 50.7 (24.4) | 0.30 | D | 51 | 50.9 (26.1) | 0.73 | |||||
| P | 52 | 55.6 (23.6) | P | 52 | 49.1 (25.0) | |||||||
| 2 yr | L | 56 | 56.7 (24.1) | 0.37 | 2 yr | L | 57 | 50.9 (27.3) | 0.71 | |||
| D | 53 | 54.0 (23.3) | 0.75 | D | 52 | 47.9 (26.6) | 0.83 | |||||
| P | 55 | 52.5 (25.1) | P | 54 | 49.0 (27.2) |
AIMS2-P: pain subscale of the Arthritis Impact Measurement Scales; AIMS2-WB: walking-bending subscale of AIMS2 scale; KSPS: Knee-Specific Pain Scale; PFS: Physical Functioning Scale (time to walk 30 m and ascend and descend a flight of stairs; SF-36-P: pain subscale of the SF-36 health-related quality of life scale; SF-36-PF: physical function subscale of the SF-36 health-related quality of life scale
The report provides no information on the proportions of primary versus secondary OA in this sample. Blinding of patients to treatment was effective (similar percentages in placebo and intervention groups guessed they received placebo). Outcome was assessed by study personnel unaware of group assignment; the operating surgeon did not participate in any way.
The primary statistical analyses were based on followup scores although change scores were also analyzed and the results did not differ. Two-sided p values were used, which were not adjusted for multiple comparisons. If evidence of superiority of interventions over placebo was lacking, equivalence analyses were to be performed using the minimal important difference, calculated by both the standard error of measurement and the mean change score among patients rated as somewhat or much better or worse on an external criterion global change scale.
Moseley and colleagues (2002) presented limited adverse events data, stating that there were only two minor complications: incisional erythema in one patient and in another, calf swelling with venography negative for thrombosis.
The authors of this RCT concluded it “provides strong evidence that arthroscopic lavage with or without debridement is not better than and appears to be equivalent to a placebo procedure in improving knee pain and self-reported function.”
Comment. The RCT by Moseley, O'Malley, Petersen, et al. (2002) provides the most important evidence on the outcomes of arthroscopic lavage and debridement for OA of the knee. The trial was rated as being good in quality, but was limited by uncertainty about generalizability due to inclusion of a single surgeon and a single clinical center. However, placebo-controlled, well-designed and well-conducted RCTs of surgical procedures are rarities that offer valuable information. These authors found no differences between placebo and arthroscopic interventions past 2 weeks of followup. Absent other placebo-controlled RCTs, evidence is lacking to show that arthroscopic lavage with or without debridement have effects above those of placebo.
Numerous critiques of the Moseley trial have been published (Laskin and Ohnsorge, 2005; Blacher, 2002; Chambers and Schulzer, 2002; Chambers, Schulzer, Sobolev et al., 2002; Ewing and Ewing, 2002; Felson and Buckwalter, 2002; Johnson, 2002; Lubowitz, 2002; Poehling, 2002). The trial authors responded to some of these comments (Wray, Moseley, O'Malley, 2002). Critical comments fall into three main areas: insufficient description of the patient sample; a patient sample that is unrepresentative of the population with OA of the knee; and problems with outcome assessment and data analysis.
Several authors noted that the RCT patient sample was not well characterized. Information was lacking on the following variables: proportions of primary and secondary OA; knee range of motion; body weight; effusion; disability and worker's compensation status; presence of mechanical symptoms; classification of preoperative radiographs and arthroscopic OA stage and pathologic details. Chambers, Schulzer, and Sobolev (2002) stated that inclusion and exclusion criteria were not well defined.
Regarding the representativeness of the patient sample, the subjects in the RCT were clearly all veterans, fairly young, and a higher proportion of males compared to the general population with OA of the knee. The low participation rate (56 percent) led Lubowitz (2002) to speculate that Moseley's patients may have had a different prognosis than the general population with OA of the knee and they may have been more susceptible to the placebo effect. Ewing and Ewing (2002) mentioned that patient selection should have been based on plain-film radiography during posterior-anterior flexion in a position of weight bearing. Johnson (2002) noted that the Moseley RCT included patients who were contraindicated for arthroscopy, including patients presenting only because of pain, as well as those with nonreactive joint, multiple compartment involvement, angulatory deformity, and noncompliance with non-weight-bearing for at least 1 month.
Several comments focused on outcome assessment and data analysis. It was noted that the primary outcome, the Knee Specific Pain Scale, had not been validated. However, a subsequently published study demonstrated that it has good psychometric qualities (O'Malley, Suarez-Almazor, Aniol et al., 2003). Estimation of sample size was based on the SF-36 pain subscale at 90 percent power to detect a moderate effect size, but that was not the primary outcome, so the trial does not have the stated level of power for the primary outcome. Chambers, Schulzer, and Sobolev (2002) observed that the trial was designed to test the superiority of interventions over placebo, but it was converted to an equivalence trial and that equivalence trials tend to require larger samples to achieve comparable power. They calculated power levels across outcomes and comparisons, finding that it ranged from 14 percent to 70 percent. They also argued that the minimal important difference should have been determined a priori and not based on trial data.
The trialists responded to critics by clarifying that 172 of 180 patients had one or more mechanical symptoms and that alignment was assessed preoperatively with plain-film radiography during posterior-anterior flexion in a position of weight bearing. The authors performed subgroup analyses on OA stage, alignment and mechanical symptoms, finding no differences in results by subgroup. Regarding the preponderance of men in the sample, the trialists cite the comment by Felson and Buckwalter (2002) that there is no basis in data to suspect that the effect of intervention depends on sex. The trialists argued that the selected patients were highly representative of those receiving arthroscopy. In response to speculation that subgroups may benefit from arthroscopic intervention, they challenge investigators to collect evidence from placebo-controlled trials among specific subpopulations.
With regard to equivalence comparisons, Moseley and colleagues found that the minimal important difference was excluded from confidence intervals in nearly all instances, suggesting equivalence between arthroscopy and placebo in this trial. In response to whether they provided an unbiased estimate of the minimal important difference, the trialists noted the lack of sufficient previously published studies quantifying it, and that the quantity used in equivalence analyses was the midpoint of literature-based and trial data-based estimates. Complaints about low power to find equivalence are misplaced because the Moseley trial found equivalence in the vast majority of comparisons. Moreover, findings of equivalence have more than statistical relevance, they suggest that arthroscopic lavage and debridement are no better than a placebo intervention involving merely incisions. Evidence of superiority over placebo should be the standard to judge arthroscopy.
| Study | Inclusion | Exclusion | n, Enrolled | n, Withdrawn | n, Outcome Evaluated |
|---|---|---|---|---|---|
| Merchan and Galindo, 1993 AD+PT vs. Conservative treatment (Cons): NSAID+↓ADLs+PT | Sedentary patients >50 yrs of age with painful limited degenerative OA of the femorotibial (FT) joint, as assessed by preoperative radiographs showing minimal joint space narrowing | Duration of pain >6 mos, weight >85 kg in men and >70 kg in women, history of previous knee surgery, appreciable joint instability or angular deformity (varus/valgus) >15 degrees, femoropatellar joint involvement | AD+PT:40 | AD+PT: 5 (died) | AD+PT:35 |
| Cons: 40 | Cons: 2 (died) | Cons: 38 | |||
| Chang et al., 1993 ALD vs. needle lavage (NL) | Persistent knee pain >3 mo, despite conservative medical/rehabilitation management, unacceptable restrictions in work/athletic/self-care activities; 'Kellgren-Lawrence grade 1–3; age >20 yrs; will to attend 3 mo/12 mo followup | Knee surgery <6 mo; total knee replacement; concurrent illness that would influence functional assessment of knee/preclude arthroscopic surgery; Kellgren-Lawrence grade 4 | ALD: 19 | ALD: 1 | ALD: 18 |
| NL: 15 | NL: 1 | NL: 15 | |||
| (both inter-current medical problems) | |||||
| Study | Age | % Female | OA Duration (months) | Preoperative OA Severity | Pain | Function |
|---|---|---|---|---|---|---|
| Merchan and Galindo, 1993 | AD+PT: mn 57.1 | AD+PT: 80 | HSS Knee Rating Score | |||
| AD+PT vs.Cons | Cons: mn 56.9 | Cons: 66 | AD+PT: mn 26.85 | |||
| Cons: mn 29.86 | ||||||
| Chang et al., 1993 | ALD: mn 61, sd 11 | ALD: 72 | ALD: mn 51, sd 51 | Kellgren-Lawrence %I/II/III | AIMS (0–1) | AIMS Physical Function (0–10) |
| ALD vs. NL | NL: mn 65, sd 13 | NL: 71 | NL: mn 53, sd 57 | ALD: 22/28/50 | ALD: mn 6.5, sd 2.0 | ALD: mn 2.3, sd 1.6 |
| NL: 14/36/50 | NL: mn 6.1, sd 2.1 | NL: mn 1.7, sd 1.0 | ||||
| Study | Interventions | Prior Treatments | Concurrent Treatments |
|---|---|---|---|
| Merchan and Galindo, 1993 | AD+PT: debridement of synovial tissue, partial meniscectomy, osteophytectomy, removal of loose bodies, limited chondroplasty, no abrasion; physical therapy (PT) 4 wks postop | AD+PT: compression bandage, early exercises, motion, weight bearing as tolerated | |
| AD+PT vs. Cons | Cons: conservative (nonoperative) treatment with NSAIDs, ↓ in ADLs, PT as in AD+PT group | ||
| Chang et al., 1993 | ALD: general anesthesia, continuous saline lavage, debridement of torn meniscus, removal of meniscal, anterior cruciate ligament fragments, removal of proliferative synovium, excision of loose articular cartilage fragments, no drilling | Conservative medical and rehabilitation management | Non-narcotic analgesics, physical therapy |
| ALD vs. NL | NL: closed needle tidal lavage, 1 liter saline, local anesthesia | ||
| Study | Initial Assembly of Comparable Groups | Low Loss to Followup, Maintenance of Comparable Groups | Measurements Reliable, Valid, Equal* | Interventions Comparable/Clearly Defined | Appropriate Analysis of Results | Overall Rating |
|---|---|---|---|---|---|---|
| Merchan and Galindo, 1993 | ? | Y | N | Y | Y | Poor |
| AD+PT vs. Cons | ||||||
| Chang et al., 1993 | ? | Y | Y | Y | Y | Fair |
| ALD vs. NL | ||||||
Merchan and Galindo (1993) randomized 40 patients each to arthroscopic debridement plus physical therapy and nonoperative conservative therapy. Seven patients died and were excluded from data analysis, five in the arthroscopy group and two in the conservative treatment group. Arthroscopic debridement included excision of synovial tissue, partial meniscectomy, osteophytectomy, removal of loose bodies, limited chondroplasty and no abrasion. Patients over 50 years of age were included for painful limited OA and minimal joint space narrowing on preoperative radiography. Groups were comparable at baseline on age, percent female and Hospital for Special Surgery (HSS) knee score; however, information is lacking on duration of disease, and body weight. Mean followup was 25 months in the arthroscopy group and 23 months in the conservative treatment group. Outcome measures were the followup HSS score, change in HSS and patient global change assessment. This trial was rated as poor in quality due to incomplete information about comparability of groups at baseline, use of an outcome of uncertain validity and lack of a blinded outcome assessor.
Chang, Falconer, Stulberg et al. (1993) randomized 34 patients to either arthroscopic lavage and debridement or closed needle lavage. One patient in each group dropped out for intercurrent medical problems so the analysis was based on 32 patients. Arthroscopic procedures entailed removal of loose tissue fragments, partial meniscectomy, synovectomy, excision of loose articular cartilage and no drilling. Closed-needle lavage employed one liter of saline injected into the knee and aspirated. Patients were selected for persistent knee pain of more than three months despite conservative medical and rehabilitation management. All patients had Kellgren-Lawrence grade 1–3 osteoarthritis. Groups were well-balanced at baseline on age, percent female, duration of knee pain, osteoarthritis grade and several pain and function scales. Outcome scales measured at 3 months and 12 months included the AIMS subscales, 50-foot walk time, patient global assessment and physician percent improvement. Patients were not blinded to group assignment but outcome assessors were. The quality of the trial was rated as fair because of uncertainty about whether allocation to groups at randomization was concealed.
| Study | Outcomes | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Merchan and Galindo, 1993 | F/U (mo) | ||||||||
| AD+PT vs. Cons | Group | n | mn (rng) | Outcome | mn | p value | |||
| AD+PT | 35 | 25 (12–36) | F/U HSS | 37.00 | 0.022 | ||||
| Cons | 38 | 23 (12–36) | (higher=better) | 32.76 | |||||
| AD+PT | Δ HSS | 10.14 | 0.001 | ||||||
| Cons | (higher=better) | 2.89 | |||||||
| F/U | % Improved | % Unchanged | % Worse | p value | |||||
| AD+PT | last | 75 | 14 | 11 | <0.001 | ||||
| Cons | 16 | 13 | 53 | ||||||
| Chang et al., 1993 | 3 mo | 12 mo | |||||||
| ALD vs. NL | ALD | NL | ALD | NL | |||||
| Outcome | mn | mn | Difference (95% CI) | mn | mn | Difference (95% CI) | |||
| AIMS Pain Scale | 5.0 | 5.4 | -0.4 (-1.6, 0.9) | 5.3 | 5.0 | 0.3 (-1.1, 1.8) | |||
| AIMS Physical Activity | 5.0 | 6.3 | -1.3 (-3.0, 0.4) | 4.8 | 6.2 | -1.4 (-3.3, 0.4) | |||
| AIMS Physical Function | 1.5 | 2.0 | -0.5 (-1.2, 0.3) | 1.7 | 2.0 | -0.3 (-1.1, 0.5) | |||
| AIMS Social Activity | 4.3 | 4.7 | -0.4 (-1.4, 0.7) | 4.6 | 4.3 | 0.3 (-1.1, 1.5) | |||
| AIMS Depression | 2.7 | 2.5 | 0.2 (-0.8, 1.1) | 1.8 | 2.6 | -0.8 (-1.6, 0.1) | |||
| AIMS Anxiety | 3.8 | 3.9 | -0.1 (-1.3, 1.0) | 3.2 | 3.5 | -0.3 (-1.3, 0.6) | |||
| 50-ft walk time, secs | 14.2 | 15.0 | -0.8 (-2.8, 1.2) | 13.9 | 14.1 | -0.2 (-2.8, 2.3) | |||
| Patient global assessment | 3.4 | 3.6 | -0.2 (-10.6, 13.8) | 4.1 | 3.3 | 0.8 (-5.3, 21.2) | |||
| Physician global % improved | 47 | 46 | 1 (-34, 36) | 41 | 23 | 18 (-15, 51) | |||
Comment. The small, poor-quality, unblinded RCT by Merchan and Galindo (1993) does not provide strong evidence of an advantage favoring arthroscopy over nonoperative therapy. These authors found significantly better results for arthroscopic debridement plus physical therapy relative to conservative treatment comprised of NSAIDs with a decrease in ADLs plus physical therapy. However, Merchan and Galindo did not report whether groups were comparable at baseline on duration of osteoarthritis or body weight, the outcome scale is of uncertain validity and a blinded outcome assessor was not used. The small trial by Chang, Falconer, Stulberg, et al. (1993) found no differences between arthroscopic lavage and debridement and closed needle lavage on pain, function and global assessment scales. This trial does not offer support for improved outcomes when arthroscopic debridement is added to lavage of the knee.
The results of the good quality placebo-controlled Moseley, O'Malley, Petersen, et al. (2002) create uncertainty about whether arthroscopic lavage and debridement achieve results surpassing placebo. The results from Merchan and Galindo are insufficient to establish the superiority of arthroscopic debridement over an active nonsurgical control therapy. The trial by Chang, Falconer, Stulberg et al. (1993) does not resolve uncertainty over the effects of arthroscopic intervention relative to placebo or active controls. Overall, the RCT evidence does not definitively show arthroscopy to be ineffective, nor does it establish effectiveness.
| Study | Inclusion | Exclusion | n, Enrolled | n, Withdrawn | n, Outcome Evaluated |
|---|---|---|---|---|---|
| Livesley et al., 1991; | OA of knee and pain with no obvious mechanical derangement of joint | Hematologic abnormalities; urate crystals in the joint aspirate; atypical radiologic signs; treatable lesions seen on arthroscopy | AL+PT: 41 | AL+PT: 4 (2 lost, 2 meniscectomy) | AL+PT: 37 |
| AL+PT vs. PT | PT: 28 | PT: 4 (lost) | PT: 24 | ||
| pts allocated to groups according to which of 2 surgeons they were initially referred |
| Study | Age | % Female | Preoperative OA Severity | Other Characteristics |
|---|---|---|---|---|
| Livesley et al., 1991; | AL+PT: mn 61, sd 7.8 | AL+PT: 32 | Thomas radiography score | Stress pain and morning stiffness worse in PT group; swelling and effusions more common in AL+PT group |
| AL+PT vs. PT | PT: mn 60.7, sd 7.9 | PT: 46 | AL+PT: mn 5.3, sd 2.6 | |
| PT: mn 5.29, sd 2.7 |
| Study | Interventions | Prior Treatments | Concurrent Treatments |
|---|---|---|---|
| Livesley et al., 1991; | AL: 2 standard portals; tourniquet; Key Med Olympus arthroscope and a hook; lavage with 2 L normal saline at room temperature; | ||
| AL+PT vs. PT | PT: same regimen for both groups, no details on PT provided |
| Study | Initial Assembly of Comparable Groups | Low Loss to Followup, Maintenance of Comparable Groups | Measurements Reliable, Valid, Equal* | Interventions Comparable/Clearly Defined | Appropriate Analysis of Results | Overall Rating |
|---|---|---|---|---|---|---|
| Livesley et al., 1991; | N | N | N | N | N | Poor |
| AL+PT vs. PT |
| Study | Outcomes | ||
|---|---|---|---|
| Livesley et al., 1991 | Investigator-devised outcome measures, 16 dimensions; -1 to +1, 3 point scale (patient global change assessment); 0–4 point scale (pain at rest, pain on activity, pain at night, joint tenderness, periarticular tenderness); 0–3 point scale (effusions); scale in minutes (duration of stiffness after rest, in the morning); scale in degrees (knee range of motion); dichotomous scale, present/absent (warmth, stress pain, wasting crepitus, sleep deprivation, swelling) | ||
| AL+PT vs. PT | F/U at 3, 6, 12 mo; 48 possible between-group comparisons of improvement in outcome (data provided for 32 comparisons) | ||
| N=61 (37 AL+PT, 24 PT) | |||
| Significant differences in degree of improvement, AL+PT vs. PT | |||
| Outcome | F/U | p value | |
| pain on activity | 3 mo | 0.003 | |
| 6 mo | 0.05 | ||
| pain at night | 3 mo | 0.01 | |
| joint tenderness | 6 mo | 0.02 | |
| swelling | 3 mo | 0.03 | |
| Subgroup analyses provided on pain at rest and pain on activity for 3 preoperative radiographic OA classes (slight, moderate, severe): significant between-group difference favoring AL+PT at 3 mo for moderate subgroup. | |||
Patients were assessed on a large number of knee measures at baseline and followup. Pain was of primary interest and it was rated at rest, on activity and at night. The authors assessed nine signs of inflammation, including joint tenderness, peri-articular tenderness, duration of stiffness at rest and in the morning, effusions, warmth, stress pain, sleep disturbance and swelling. Other measures included knee range of motion, the presence of wasting and crepitus and patient global change assessment at followup. Patients were comparable at baseline on age, percent female and preoperative radiographic OA severity. Information was lacking on baseline duration of osteoarthritis and body weight. There were differences between groups in baseline stress pain, morning stiffness, swelling and effusions. Using the U.S. Preventive Services Task Force rating system, the Livesley, Doherty, Needoff et al. (1991) trial was rated unfavorably on all 6 dimensions.
Results. Followup was conducted at 3, 6 and 12 months. Of the 48 possible between-group comparisons, the article provides data for 32. Five comparisons revealed statistically significant results favoring arthroscopic lavage plus physical therapy: pain on activity at 3 and 6 months, pain at night at 3 months, joint tenderness at 6 months, and swelling at 3 months. Subgroup analyses were provided on pain at rest and pain on activity for three classes of preoperative radiographic OA severity (slight, moderate, and severe). The article reports a significant advantage at 3 months among moderate class patients in the lavage plus physical therapy group. In addition, presence or absence of effusion was not found to be correlated with results.
Comment. Livesley, Doherty, Needoff et al. (1991) conclude that their results confirm the effectiveness of arthroscopic lavage as a treatment for symptomatic OA of the knee. However, critical review of this study contradicts this view. This small study reported no significant advantage for lavage in 43 of 48 comparisons. Furthermore, it was flawed by lack of blinding, lack of data on some baseline characteristics, imbalances on baseline characteristics without corresponding adjustment in the analysis, and absence of details about physical therapy. In addition, the study does not address the possible contribution of placebo effects to the observed results. This poor-quality quasi-experimental study does not support conclusions about the relative effectiveness of arthroscopic lavage plus physical therapy and physical therapy alone.
Administrative Database Evidence. Study Characteristics. The largest single source of evidence came from an administrative database, with 14,391 patients (Wai, Kreder, and Williams, 2002). This analysis was conducted within the Ontario Health Insurance Plan physician claims system between 1992 and 1996. The focus of the study was to evaluate outcome (further surgery, adverse events) and patterns of utilization across 16 intraprovincial geographic units. Claims were linked with discharge abstracts to collect outcome data. The maximum followup was 3 years. An algorithm was created to capture patients with a primary diagnosis of OA of the knee. Patients were excluded for having a primary diagnosis of rheumatoid arthritis and those with bilateral knee procedures on the same day. Data were analyzed with a Cox proportional hazards regression model. The Charlton-Deyo comorbidity index was used for adjustment purposes. Minimum age for inclusion was 50 years, the mean was 62.4 and the oldest age was 92. The proportion of females was 49.9 percent. No other patient baseline characteristics were mentioned. Details were unavailable about the arthroscopic debridement procedure. With the exception of the lack of more details describing the patients, the intervention and whether data quality was audited, this study was generally well-reported and well-conducted. No funds were received to support the study and the authors received no benefits from commercial parties.
| Study | Group | n | F/U | % Repeat Arthroscopy | % Total Arthroplasty | % High Tibial Osteotomy |
|---|---|---|---|---|---|---|
| Wai et al., 2002; AD | All pts | 14391 | ≤ 1 yr | 2.8 | 9.2 | 1.2 |
| 6212 | ≤ 3 yr | 7.7 | 18.4 | 2.9 | ||
| 50–59 yo | 6487 | ≤ 1 yr | 3.3 | 4.0 | 1.6 | |
| 2918 | ≤ 3 yr | 8.9 | 9.7 | 4.2 | ||
| 60–69 yo | 5435 | ≤ 1 yr | 2.4 | 11.1 | 1.0 | |
| 2354 | ≤ 3 yr | 6.8 | 23.7 | 2.0 | ||
| 70–79 yo | 2223 | ≤ 1 yr | 2.2 | 19.0 | 0.4 | |
| 854 | ≤ 3 yr | 6.2 | 32.7 | 0.8 | ||
| ≥ 80 yo | 246 | ≤ 1 yr | 1.6 | 17.5 | 0.0 | |
| 86 | ≤ 3 yr | 8.1 | 31.4 | 0.0 | ||
| Rate of total knee arthroplasties increased with age at 1 yr and 3 yrs (p=.0001); Cox's proportional hazards model adjusted analysis - age still associated (p=.02). No other significant relationships in unadjusted or adjusted analyses. | ||||||
| Study | % All/Any Adverse Events | % Surgical Complications | % Stroke/Myocardial Infarction | % Infections | % Deep Vein Thrombosis | % Death <3 mo |
|---|---|---|---|---|---|---|
| Wai et al., 2002; | 1.9 | 0.5 | 0.3 | 0.5 | 0.6 | 0.1 |
| AD (n=14,391) |
Regarding utilization, on average there were 1.4 arthroscopic debridements per 1000 individuals in Ontario between 1992 and 1996. Across this time period, there were significant increases in the age and sex-adjusted population rates, at an average rate of 10.1 percent per year. Across intraprovincial geographic units, population rates ranged between 0.7 to 2.3 persons per 1,000. Geographic units with higher rates of arthroscopic debridement were associated with higher rates of total knee arthroplasty within 1 year for patients aged 60 or older.
Comment. The study by Wai, Kreder, and Williams (2002) provides estimates of the probabilities of further surgery and adverse events for the most populous Canadian province from 1992 to 1996. These data may be representative of outcomes in everyday practice, but administrative databases are also susceptible to biases of underreporting and problems in the quality of available data. Thus, it is unclear how accurately this study reflects the frequency of adverse events after arthroscopic surgery. Furthermore, this study did not report on pain or function outcomes. The report only presented significant differences in further surgery with increasing age. It included no comparison with placebo or other interventions. This administrative database analysis offers evidence of limited value to this evidence report. While it shows different rates of further surgery across age subgroups, it leaves unanswered the question of whether there are different effects in terms of other outcomes of arthroscopy versus placebo or other treatments.
| Study | Inclusion | Exclusion | n, Knees | n, Patients |
|---|---|---|---|---|
| Aaron et al., 2006, ALD | Consecutive pts; met ACR OA of tibiofemoral joint; failed oral anti-inflammatory treatment; age 18–70 yo; Kellgren-Lawrence grade ≥2 | Previous infection; OA of patello-femoral joint; other/confounding diagnoses; | 110 | 110 |
| Bernard et al., 2004; ALD | 01/91 – 12/93; consecutive pts; knee OA (Outerbridge 3 or 4); pain uncontrolled by non-operative treatment; radiographic OA changes | 100 | 99 | |
| Krystallis et al., 2004; ALD | 02/97 – 06/01; OA of the knee; standard conservative non-operative treatment had failed; local (L), general (G) or peridual anesthesia (P) | 201 | 197 | |
| Dervin et al., 2003; AD | 03/95 – 11/97; OA of knee; 40–75 yo; remained symptomatic despite supervised PT and comprehensive medical management | Inflammatory/traumatic forms of OA; | 126 | |
| Jackson and Dieterichs, 2003; ALD | 01/95 – 06/97; ACR criteria diagnosis of OA of knee; Jackson and Dieterichs stage III/IV; consecutive series | Stage I and IV; marrow stimulation techniques, laser or radio-frequency chondroplasty | 121 | |
| Bohnsack et al., 2002; AD | 05/89 – 11/96; history of knee pain, swelling, radiological signs of severe OA (grade I–IV) | 104 | ||
| Shannon et al., 2001; ALD | Retrospective consecutive series; mild-moderate OA over 4-yr period; symptoms not severe enough for joint replacement; conservative treatment alone had failed or non-specific mechanical symptoms out of proportion to clinical and radiologic findings | Preop clinical/radiologic diagnosis of meniscal tear or loose body | 55 | 54 |
| Harwin, 1999; ALD | 1980 - 1993; areas of fibrillated articular cartilage with exposed bone; unresponsive to all modalities of nonoperative treatment | 204 | 190 | |
| McGinley et al., 1999; AD | 1981 - 87; pts > 55 yo OA symptoms including pain limiting function and Albach radiographic JSN grade 2–3; > 10 yr F/U | 91 | 77 | |
| Linschoten and Johnson, 1997; ALD | 07/85 – 01/88; age ≥ 40 yo; arthroscopically confirmed degenerative changes in ≥ 2 of 3 compartments or single compartment Outerbridge III/IV | Arthroscopies for diagnosis or treatment of acute injuries, preliminary diagnosis of degenerative joint disease not confirmed intraoperatively | 56 | 55 |
| Yang and Nisonson, 1995; ALD | 07/89 – 07/93; did not respond to conservative nonoperative treatment; persistent evidence of internal derangement of knee; did not show severe signs and symptoms to merit total knee arthroplasty | History of rheumatoid arthritis; gout; ochronosis; ankylosing spondylitis; hemophilia; osteonecrosis; posttraumatic or postinfectious osteoarthritis | 105 | 103 |
| Aichroth et al., 1991; ALD | 1977 - 1988; degenerative knee joint | 276 | 254 | |
| McLaren et al., 1991; ALD | 07/82 – 07/86; OA confirmed at arthroscopy; nonoperative treatments either did not control symptoms sufficiently to allow normal daily activities or control rest pain | Inflammatory joint disease, malunited fractures and ligamentous instability | 170 | |
| Ogilvie-Harris and Fitsialos, 1991; ALD | 1979 - 1987; degenerative arthritis of the knee; persistent symptoms despite adequate medical management | 441 | ||
| Timoney et al., 1990; ALD | 07/81 – 02/86; age > 40 yo; intraoperative diagnosis of OA | rheumatoid arthritis, acute infection arthritis, acute injury | 111 | 108 |
| Bert and Maschka, 1989; AD | 09/81 – 12/82; conservative methods of treatment had failed; available for 5 yr followup | 126 | ||
| Sprague, 1981; ALD | 08/78 – 11/79; pre- and postop moderate to extreme degenerative arthritis of 2–3 compartments; initial conservative treatment | 69 | 63 | |
| Study | Age | % Female | Obesity (%) | Disease Category (%) | Disease Duration | Preoperative Disease Severity (%) | Arthroscopic Disease Severity (%) | Mechanical Symptoms (%) |
|---|---|---|---|---|---|---|---|---|
| Aaron et al., 2006; ALD | Mn 61.7 | 67 | Mn BMI: 31.8 | Kellgren-Lawrence (2/3/4) | Noyes-Stabler mn total 21.6 | Locking or buckling: 56 | ||
| 53/29/18 | ||||||||
| Bernard et al., 2004; ALD | Mn 55, sd 13 | 39 | ||||||
| Krystallis et al., 2004; ALD | L: mn 60.8, rng 31–71 | 49 | 1°: 94 | Fairbank | Outerbridge | Mechanical: 33 | ||
| G: mn 59.9, rng 30–67 | 2°: 6 | (0/I/II/III) | (I–II/III/IV) | |||||
| P: mn 62.2, rng 35–75 | 12/36/40/12 | 12/28/60 | ||||||
| Dervin et al., 2003; AD | Mn 61.7, sd 8.6 | 53 | BMI > 27: 67 | Dougados | Giving way: 39; Locking: 22 | |||
| BMI > 33: 25 | Medial III/IV: 62 | |||||||
| Lateral III/IV: 13 | ||||||||
| Jackson and Dieterichs, 2003; ALD | I: mn 35.5, rng 22–60 | Jackson and Dieterichs | ||||||
| (I/II/III/IV) | ||||||||
| II: mn 54, rng 26–85 | 7/26/32/35 | |||||||
| III: mn 56, rng 24–78 | ||||||||
| IV: mn 64, rng 41–83 | ||||||||
| Bohnsack et al., 2002; AD | Mn 60, rng 50–83 | 52 | Jaeger and Wirth III/IV | Outerbridge III/IV: 50–80% | ||||
| Shannon et al., 2001; ALD | Mn 60.9, rng 48–83 | 56 | Mn wt: 76.6 kg, rng 54–100 | # mo: % | ||||
| < 3: 20 | ||||||||
| 3–12: 43 | ||||||||
| > 12: 39 | ||||||||
| Harwin, 1999; ALD | Mn 62.1, rng 32–88 | 57 | ||||||
| McGinley et al., 1999; AD | Mn 62.6, rng 55–82 | Outerbridge: IV: 100 | ||||||
| Linschoten and Johnson, 1997; ALD | Mn 62.5, rng 41–79 | 51 | ||||||
| Yang and Nisonson, 1995; ALD | Mn 64.2, sd 4.3 | 19 | # mo: % | Fairbank | ||||
| < 1: 17 | (0/I/II/III) | |||||||
| 1–12: 62 | 15/50/24/7 | |||||||
| > 12: 15 | ||||||||
| Aichroth et al., 1991; ALD | Mn 49, rng 28–82 | 28 | Instability: 54, locking: 36 | |||||
| McLaren et al., 1991; ALD | Mn 54, rng 23–82 | 30 | 1°: 81 | |||||
| 2°: 19 | ||||||||
| Ogilvie-Harris and Fitsialos, 1991; ALD | Mn 58, rng 28–92 | ≥ 2 yrs in most pts | Outerbridge | |||||
| I–II/III/IV) | ||||||||
| 32/36/32 | ||||||||
| Timoney et al., 1990; ALD | Mn 58.1, rng 40–81 | 31 | mn 48.9 mo, rng 2–144 | 0–III scale | ||||
| Bert and Maschka, 1989; AD | DA mn 66, rng 46–84 | DA 46 | % obese: | Ahlback | Outerbridge | |||
| D mn 61, rng 39–82 | D 42 | DA 26 | II–100 | IV: 100 | ||||
| D 22 | ||||||||
| Sprague, 1981; ALD | Mn 56, rng 24–78 | 38 | ||||||
| Study | Lavage + Debridement | Lavage | Debridement | Chondroplasty | Partial/Total Meniscectomy | Partial Synovectomy | Osteophytectomy | Abrasion | Drilling |
|---|---|---|---|---|---|---|---|---|---|
| Aaron et al., 2006 | X | X | X | X | X | ||||
| Bernard et al., 2004 | X | X | X | ||||||
| Krystallis et al., 2004 | X | X | X | ||||||
| Dervin et al., 2003 AD | X | X | X | X | |||||
| Jackson and Dieterichs 2003 | X | X | X | ||||||
| Bohnsack et al., 2002 | X | X | X | X | |||||
| Shannon et al., 2001 | X | X | |||||||
| Harwin, 1999 | X | X | X | X | |||||
| McGinley et al., 1999 | X | X | X | X | |||||
| Linschoten and Johnson, 1997 | X | X | X | X | |||||
| Yang and Nisonson, 1995 | X | X | X | X | X | ||||
| Aichroth et al., 1991 | X | X | X | X | X | ||||
| McLaren et al., 1991 | X | X | X | X | X | ||||
| Ogilvie-Harris and Fitsialos, 1991 | X | X | X | X | |||||
| Timoney et al., 1990 | X | X | X | X | X | ||||
| Bert and Maschka, 1989 | X | X | X | X | X | X | |||
| Sprague, 1981 ALD | X | X | X | X | X | ||||
| Study | Clearly Defined Question | Well-Described Study Population | Well-Described Intervention | Use of Validated Outcome Measures (Independently Assessed) | Appropriate Statistical Analysis | Well-Described Results | Discussion/Conclusions Supported by Data | Funding/Sponsorship Source Acknowledged |
|---|---|---|---|---|---|---|---|---|
| Aaron et al., 2006 | + | - | + | + (+) | + | - | + | + |
| Bernard et al., 2004 ALD | + | - | - | + (?) | + | - | + | ? |
| Krystallis et al., 2004 ALD | - | - | + | ? (?) | + | - | - | ? |
| Dervin et al., 2003 AD | + | - | - | + (?) | + | - | + | + |
| Jackson and Dieterichs, 2003 ALD | + | - | - | - (?) | - | - | + | ? |
| Bohnsack et al., 2002 AD | - | - | - | + (?) | + | - | - | ? |
| Shannon et al., 2001 ALD | + | - | + | +(?) | - | - | + | ? |
| Harwin, 1999 ALD | + | - | + | + (?) | - | + | - | ? |
| McGinley et al.,1999 AD | - | - | - | -(?) | - | - | - | ? |
| Linschoten and Johnson, 1997 ALD | - | - | + | - (?) | - | - | - | ? |
| Yang and Nisonson, 1995 ALD | + | - | + | - (?) | - | - | - | ? |
| Aichroth et al., 1991 ALD | - | - | + | - (?) | - | - | - | + |
| McLaren et al., 1991 ALD | + | - | + | +(?) | - | - | - | ? |
| Ogilvie-Harris and Fitsialos, 1991 ALD | - | - | - | - (?) | - | - | - | ? |
| Timoney et al., 1990 ALD | + | - | - | ? (?) | + | - | - | + |
| Bert and Maschka, 1989 AD | - | - | + | ? (?) | - | - | - | ? |
| Sprague, 1981 ALD | - | - | + | - (?) | - | - | - | ? |
| Study | Outcomes | |||
|---|---|---|---|---|
| Timoney et al., 1990 ALD | N=108; mn F/U 50.6 mo | |||
| Pre | F/U | p | ||
| Mn HSS score (sd) | 24.7 (9.2) | 36.1 (16.3) | <0.001 | |
| Study | Group | n | Mean F/U | % Better/Improved | % Same/Unchanged | % Worse | |||
|---|---|---|---|---|---|---|---|---|---|
| Shannon et al., 2001 ALD | All pts | 54 | 29.6 mo | 67 | 33 | 0 | |||
| Mn duration of symptom relief 25.5 mo, rng 1–51 | |||||||||
| No influence on results of sex, age, weight, preop Duke score, duration of symptoms | |||||||||
| Harwin, 1999 ALD | All pts | 190 | 7.4 yr | 63 | 21 | 16 | |||
| Normal alignment | 57 | 84 | 12 | 4 | |||||
| Mod malalignment | 102 | 68 | 24 | 9 | |||||
| Sev malalignment | 45 | 27 | 27 | 47 | |||||
| McLaren et al., 1991 ALD | All pts | 170 | 25 mo | 65 | 28 | 7 | |||
| Study | Group | n | Mean F/U | % Excel | % Excel/Good | % Good | % Fair | % Poor |
|---|---|---|---|---|---|---|---|---|
| Krystallis et al., 2004 ALD | All pts | 201 | 32 mo | 43 | ||||
| Mechanical sx | 67 | 66 | ||||||
| Loading sx | 134 | 31 | ||||||
| No difference between local, general and peridural anesthesia groups (ANOVA, p=0.71) | ||||||||
| Jackson and Dieterichs, 2003 ALD | All pts | 121 | ≥4 yr | 50 | 27 | 22 | ||
| Stage I | 8 | 100 | 0 | 0 | ||||
| Stage II | 32 | 91 | 0 | 9 | ||||
| Stage III | 39 | 49 | 28 | 23 | ||||
| Stage IV | 42 | 12 | 52 | 36 | ||||
| Linschoten and Johnson, 1997 ALD | All pts | 55 | 49 mo | 68 | 32 | |||
| 6 mo | 82 | 18 | ||||||
| 12 mo | 77 | 23 | ||||||
| 24 mo | 70 | 30 | ||||||
| 36 mo | 68 | 32 | ||||||
| 48 mo | 68 | 32 | ||||||
| Significantly poorer results for Outerbridge class IV on arthroscopy in both medial and lateral compartments | ||||||||
| Yang and Nisonson, 1995 ALD | All pts | 103` | 11.7 mo 20 | 45 | 32 | 3 | ||
| Sx < 1 mo | 78 | |||||||
| Sx > 12 mo | 52 | |||||||
| Mechanical sx | 96 | |||||||
| No mechanical | 42 | |||||||
| Fairbank 0/I | 69 | |||||||
| Fairbank II/III | 36 | |||||||
| Mild degeneration | 74 | |||||||
| Severe degeneration | 39 | |||||||
| Outcome significantly better for mechanical symptoms, mild degeneration. Outcome not correlated with age, sex, side or duration of followup | ||||||||
| Aichroth et al., 1991 ALD | All pts | 254 | 44 mo | 18 | 57 | 15 | 10 | |
| All pts | 75 | |||||||
| < 60 yo | 78 | |||||||
| > 60 yo | 55 | |||||||
| Satisfactory result correlated with age (p<0.008), Ahlback preop radiographic severity (p<0.001) and with Outerbridge operative severity (p<0.001); no correlation with type or location of meniscal tear or performance of previous surgery | ||||||||
| Ogilvie-Harris and Fitsialos, 1991 ALD | All pts | 441 | ≥2 yr | 68 | ||||
| 1 compartment | 103 | 82 | ||||||
| 2 compartments | 135 | 58 | ||||||
| Abrasion | 32 | 56 | ||||||
| Meniscectomy | 149 | 68 | ||||||
| Lavage only | 4 | 25 | ||||||
| Timoney et al., 1990 ALD | All pts | 108 | 50.6 mo | 5049 | 20 | 41 | ||
| Subjective results deteriorated over time. | ||||||||
| Subjective results significantly worse for those with symptoms > 48 mo, those with severe chondromalacia; not correlated with meniscal pathology, condition of ACL, those undergoing limited lavage and debridement | ||||||||
| Bert and Maschka, 1989 AD | Debridement | |||||||
| Abrasion | 59 | 5 yr | 51 | 16 | 33 | |||
| Debridement | 67 | 66 | 13 | 21 | ||||
| Sprague, 1981 ALD | All pts | 63 | 13.6 mo | 74 | 10 | 16 | ||
| Study | Outcomes | ||
|---|---|---|---|
| McLaren et al., 1991 ALD | n=170; mean followup 25 mo | ||
| Disability (%) | Pre | Post | |
| No restriction | 10 | 32 | |
| Limited recreation & sports | 48 | 45 | |
| Unable to work | 25 | 12 | |
| Restricted daily activities | 17 | 11 | |
| Ogilvie-Harris and Fitsialos, 1991 ALD | n=441; mean followup ~4 yr | ||
| Domain | % | ||
| Pain, no/occasional | 53 | ||
| Pain improved | 86 | ||
| Activity limitation, no/occasional | 59 | ||
| Activity improved | 83 | ||
| Analgesic, no/occasional | 79 | ||
| Analgesic, improved | 32 | ||
| Satisfaction | 90 | ||
| Results related to disease severity | |||
| Study | Group | n | F/U | % Any | % Major | % Repeat Arthroscopy | % Unicondylar Arthroplasty | % Total Arthroplasty | % High Tibial Osteotomy |
|---|---|---|---|---|---|---|---|---|---|
| Aaron et al., 2006; ALD | All pts | 110 | 34 mo | 15 | |||||
| Total knee arthroplasty was related to baseline Kellgren-Lawrence grade. | |||||||||
| Bernard et al., 2004; ALD | All pts | 100 | 18 | 3 | 11 | 4 | |||
| 5-yr major surgery-free survival: all: ~85%; < 60 yo: 89%;≥60 yo: 68% (X2, p=0.02); prior meniscectomy did not affect outcome | |||||||||
| Jackson and Dieterichs, 2003; ALD | All pts | 121 | ≥4 yr | 10 | 12 | ||||
| Stage I | 8 | 0 | 0 | ||||||
| Stage II | 32 | 9 | 0 | ||||||
| Stage III | 39 | 15 | 8 | ||||||
| Stage IV | 42 | 7 | 29 | ||||||
| Bohnsack et al., 2002; AD | All pts | 104 | 33.1 mo | 20 | 4 | 4 | 8 | 2 | |
| unspecified procedure (4%) | |||||||||
| Shannon et al., 2001; ALD | All pts | 54 | 29.6 mo | 7 | 19 | ||||
| Harwin, 1999; ALD | All pts | 190 | 7.4 yr | 15 | 13 | ||||
| McGinley et al., 1999; AD | All pts | 77 | 13.2 yr | 33 | |||||
| Linschoten and Johnson, 1997; ALD | All pts | 55 | 13 | ||||||
| Further surgery was significantly associated with presence of Outerbridge class IV on arthroscopy and presence of chondromalacia in lateral compartment. | |||||||||
| Yang and Nisonson, 1995; ALD | All pts | 103 | 11.7 mo | 3 | 2 | ||||
| Aichroth et al., 1991; ALD | All pts | 254 | 46 mo | 14 | |||||
| McLaren et al., 1991; ALD | All pts | 170 | 25 mo | 5 | 4 | 4 | |||
| Timoney et al., 1990; ALD | All pts | 108 | 50.6 mo | 6 | 21 | ||||
| Bert and Maschka, 1989; AD | All pts | 126 | 5 yr | 20 | |||||
| Sprague, 1981; ALD | All pts | 63 | 13.6 mo | 3 | 2 | ||||
| Study | Group | n | Mean F/U | % All/Any | % Prolonged Drainage | % Hemarthrosis | % Effusion | % Infections | % DVTs | % Other |
|---|---|---|---|---|---|---|---|---|---|---|
| Krystallis et al., 2004; ALD | All pts | 197 | 32 mo | 24.9 | minor intraop complications:6.1 | |||||
| Shannon et al., 2001; ALD | All pts | 54 | 29.6 mo | 0 | ||||||
| Harwin, 1999; ALD | All pts | 190 | 7.4 yr | 2 | 0.5 | |||||
| Linschoten and Johnson, 1997; ALD | All pts | 55 | 49 mo | 13 | 1.9 | spinal headache: 1.9 | ||||
| postop nausea: 1.95 | ||||||||||
| Yang and Nisonson, 1995; ALD | All pts | 103 | 11.7 mo | 1 | superficial cellulites: 2 | |||||
| McLaren et al., 1991; ALD | All pts | 170 | 25 mo | 1.2 | 0.6 | |||||
| Timoney et al., 1990; ALD | All pts | 108 | 50.6 mo | 6.5 | 0 | 0.9 | ||||
| Study | Inclusion | Exclusion | n, Enrolled | n, Withdrawn | n, Outcome Evaluated |
|---|---|---|---|---|---|
| Forster and Straw, 2003 | On waiting list for arthroscopic washout; symptomatic knee OA; radiographic evidence of some remaining joint space on weight bearing films; fit for regional or general anesthesia | Mechanical symptoms; IA injection < 6 mo; hypersensitivity to avian proteins | ALD: 19 | ALD: 4 (2 lost, 2 refused) | ALD: 15 |
| ALD vs. IA Hyalgan | Hyalgan: 19 | Hyalgan: 2 (lost) | Hyalgan: 17 |
Only four studies gave data on baseline body weight (Aaron, Skolnick, Reinert et al., 2006; Dervin, Stiell, Rody, et al., 2003; Shannon, Devitt, Poynton, et al., 2001; Bert and Maschka, 1989). Two studies specified whether patients had primary versus secondary OA, with both studies selecting more than 80 percent primary OA (Krystallis, Kirkos, Papavasiliou, et al., 2004; McLaren, Blokker, Fowler, et al., 1991). Four articles provided information about disease duration (Shannon, Devitt, Poynton, et al., 2001; Yang and Nisonson, 1995; Ogilvie-Harris and Fitsialos, 1991; Timoney, Kneisl, Barrack, et al., 1990). Three studies mentioned preoperative disease severity classification (Jackson and Dieterichs, 2003; Yang and Nisonson, 1995; Timoney, Kneisl, Barrack, et al., 1990), 3 studies described only arthroscopic disease severity ratings (Dervin, Stiell, Rody, et al., 2003; McGinley, Cushner, and Scott, 1999; Ogilvie-Harris and Fitsialos, 1991), and four studies provided both pre- and intra-operative information (Aaron, Skolnick, Reinert et al., 2006; Krystallis, Kirkos, Papavasiliou, et al., 2004; Bohnsack, Lipka, Ruhmann, et al., 2002; Bert and Maschka, 1989). Four articles stated that some patients had mechanical symptoms (Aaron, Skolnick, Reinert et al., 2006; Krystallis, Kirkos, Papavasiliou, et al., 2004; Dervin, Stiell, Rody, et al., 2003; Aichroth, Patel, and Moyes, 1991).
Clearly Defined Question: Of the 17 studies, nine put forward a clearly defined question. The remainder either did not state a clear question or stated one that was beyond the reach of the case series as a study design.
Well-Described Study Population: None of the case series were satisfactory on this element. None clearly stated the preoperative case definition criteria for OA of the knee, although Aaron, Skolnick, Reinert et al. (2006) and Jackson and Dieterichs (2003) cited the ACR diagnostic criteria. Only two studies (Yang and Nisonson, 1995; Timoney, Kneisl, Barrack, et al., 1990) reported on all items of the minimal set of baseline patient characteristics: age, sex, preoperative disease severity and duration of disease. This element primarily influences external validity in that it is easier to generalize from a well-described study population than a poorly described population. It also reflects on internal validity to the extent that investigators provide complete accounting of participants included, excluded and lost to followup. Only six of 17 studies provided a full accounting of participant flow.
Well-Described Intervention: Ten studies gave sufficient descriptions of interventions. Other reports either failed to note cointerventions or did not mention whether lavage accompanied debridement.
Use of Validated Outcome Measures (Independently Assessed): Only one study mentioned using an independent outcome assessor (Aaron, Skolnick, Reinert et al., 2006). Thus, outcome measures could be influenced by bias due to participants and investigators. Only seven studies used validated outcome measures, including the Knee Society pain domain scale (Aaron, Skolnick, Reinert et al., 2006), Lysholm and Gillquist rating scale (Bohnsack, Lipka, Ruhmann, et al., 2002); the WOMAC and SF-36 scales (Dervin, Stiell, Rody, et al., 2003). Bernard, Lemon, and Patterson (2004) assessed Kaplan-Meier time to further major surgery. Three studies measured global patient change assessment, for which no external criterion validation is necessary (Shannon, Devitt, Poynton, et al., 2001; Harwin, 1999; McLaren, Blokker, Fowler, et al., 1991). It is unclear whether several scales have been validated, including the Duke Arthroscopy score (Shannon, Devitt, Poynton, et al., 2001), the Baumgaetner scale (Krystallis, Kirkos, Papavasiliou, et al., 2004) and the Hospital for Special Surgery rating score (Timoney, Kneisl, Barrack, et al., 1990). All other rating instruments appear to be scales devised by the study investigators having uncertain pyschometric properties. Average followup ranged from about 1 year to 13.2 years.
Appropriate Statistical Analysis: Six studies used appropriate statistical analyses, for example, performing prepost tests on paired data. The remaining 11 studies either reported no statistical test results or inappropriate ones. Absent statistical tests or inappropriate analyses could give a biased view of study outcomes.
Well-Described Results: Only one of the 17 studies (Harwin, 1999) gave well-described results, consisting of validated measures, with adequate accounting of followup; and inclusion of both potentially beneficial outcomes and adverse events. Incomplete reporting of results could lead to a biased representation of a study's findings.
Discussion/Conclusions Supported by Data: Five articles stated conclusions that were supported by data. The other 12 articles either failed to note limitations of the data or stated conclusions that went beyond the data and design of the study.
Funding/Sponsorship Source Acknowledged: Only four articles mentioned whether the study was funded or if the authors had financial relationships with manufacturers.
Overall, this body of case series evidence is of poor quality. The best-rated studies (Aaron, Skolnick, Reinert et al., 2006; Dervin, Stiell, Rody, et al., 2003) were favorable on 6 of 8 items. Only three studies (Bernard, Lemon, and Patterson, 2004; Shannon, Devitt, Poynton, et al., 2001; Harwin, 1999) were rated favorably on four out of the eight items in the Carey and Boden scale. Two studies (Yang and Nisonson, 1995; McLaren, Blokker, Fowler, et al., 1991) rated well on three of eight items. Ten other case series were rated favorably on two or fewer items. Bias is a particular concern in that only six studies give a full accounting of participant flow, no study used an independent outcome assessor, and only one study presented well-described results. Lack of an independent assessor, in all but one study, is perhaps the most important factor given that the outcomes generally assessed, pain, function and global result, are subjective and susceptible to bias and placebo effects.
| Study | Outcomes | |||
|---|---|---|---|---|
| Aaron et al., 2006 ALD | N=110, 12 lost to F/U; mn F/U 34 mo (24–74 mo) | |||
| Knee Society pain | Pre | F/U | p | |
| Mn | 11.9 | 30.8 | <0.001 | |
| Success=Knee Society pain ≥ 30 in 72 (65%), failure in 38 (35%) | ||||
| Significant predictors of percent success: Kellgren-Lawrence grade, abnormal limb alignment, medial/lateral joint space width; intraoperative lesion severity; mechanical symptoms did not predict success, | ||||
| Bohnsack et al., 2002 AD | N=104; mn F/U 5.4 r | |||
| Lysholm & Gillquist | Pre | F/U | p | |
| Mn | 40 | 69 | <0.01 | |
| Higher gain in Lysholm & Gillquist score in pts < 60 yo, monolateral OA; no influence of meniscectomy. | ||||
| Dervin et al., 2003 AD | N=126; mn F/U 2 yr | |||
| MCII WOMAC pain: 44% | ||||
| MCII predicted by tenderness at medial joint line, positive Steinman, unstable meniscal tear (logistic regression) | ||||
A validated pain scale, the Knee Society pain domain was assessed in the study by Aaron, Skolnick, Reinert et al. (2006). Mean scores improved from 11.9 to 30.8 at an average of 34 months' followup (p<0.001). The authors selected a gain of 30 points on as successful outcome, finding that 65 percent met this definition, while 35 percent were failures.
Jackson and Dieterichs (n=121) had at least 4 years of followup, reporting excellent or good results in 50 percent. Excellent or good results were achieved in 51 percent of 59 patients who underwent debridement plus abrasion and 66 percent of 67 patients receiving debridement alone in the series by Bert and Maschka (1989, 5 year followup).
Ogilive-Harris and Fitsialos (1991, n=441, minimum 2 year followup) reported good results in 68 percent and Sprague (1981, n=63, mean followup 13.6 months) found good results in 74 percent. Linschoten and Johnson (1997, n=55, mean followup 49 months) found good results in 68 percent. Timoney, Kneisl, Barrack, et al. (1990, n=108, mean followup 50.6 months) found good results in 50 percent and significantly worse results for those with symptoms over 48 months and those with severe chondromalacia on arthroscopy.
Comment. Authors of case series commonly conclude from their results that arthroscopic lavage and debridement are effective, paying inadequate attention to their studies' limitations. The case series is a weak design that can demonstrate effectiveness under certain circumstances. The methodologic quality of case series must be high, with use of validated outcome scales assessed independently, full accounting of selected and excluded patients and appropriate analysis of both beneficial outcomes and adverse events. In addition, the observed effect in case series must be large enough to exceed potential biases and nonspecific effects. This set of studies is of particularly low quality. Only one study clearly used an independent outcome assessor and most used outcome scales that are unvalidated or of uncertain validity. Patient samples were poorly described, appropriate statistical analyses were rare and only one of these articles gave well-described results. This low-quality body of case series evidence contrasts with the high-quality placebo-controlled RCT evidence from Moseley, O'Malley, Petersen, et al. (2002), which did not find that arthoscopic lavage and debridement are superior to placebo. Thus, the case series evidence reviewed here is inadequate to resolve uncertainty raised by the Moseley trial.
On the question of whether arthroscopy outcomes differ across subgroups, it is fundamental to first establish whether the effects of arthroscopic exceed those of placebo. If a placebo-controlled RCT shows that treatment effects of arthroscopy are significantly greater in certain subgroups, this would be strong evidence to support use of arthroscopic in particular patient subsets. However, lacking this type of evidence, subgroup analyses from other types of studies would be of very limited value.
Placebo-Controlled RCT Evidence. The publication by Moseley, O'Malley, Petersen, et al. (2002) describing the only placebo-controlled RCT did not present any subgroup analyses. In response to letters to the editor about subgroups, the authors replied (Wray, Moseley, O'Malley, 2002) that they performed subgroup analyses on OA stage, alignment and mechanical symptoms, finding no differences in results by subgroup. Thus, it has not been established that arthroscopic lavage and debridement produce better results than placebo for any specific group of patients.
Quasi-Experimental Evidence. Livesley, Doherty, Needoff, et al. (1991, n=61, followup ≤12 months) compared arthroscopic debridement plus physical therapy with physical therapy alone. Subgroup analyses were provided on pain at rest and pain on activity for 3 classes of preoperative radiographic OA severity (slight, moderate and severe). The article reports a significant advantage at 3 months among moderate class patients in the lavage plus physical therapy group. In addition, presence or absence of effusion was not found to be correlated with results. This poor quality study was flawed by lack of blinding, imbalances on baseline characteristics without corresponding adjustment in the analysis, and absence of details about physical therapy. The suggestion of better outcomes in the moderate OA subgroup should not be interpreted as evidence that arthroscopic debridement achieves better results than placebo for this subgroup.
To summarize case series evidence, three patient factors were represented by at least two studies showing different outcomes for patient subgroups. Three studies found better outcomes among patients younger than 60 years of age (Bernard, Lemon, and Patterson, 2004; Bohnsack, Lipka, Ruhmann, et al., 2002; Yang and Nisonson, 1995). Two studies found that patients with mechanical symptoms had better results than those without them (Krystallis, Kirkos, Papavasiliou, et al., 2004; Yang and Nisonson, 1995) and one study found no relationship (Aaron, Skolnick, Reinert et al., 2006). Six studies found that increased OA severity was correlated with worse results (Aaron, Skolnick, Reinert et al., 2006; Jackson and Dieterichs, 2003; Linschoten and Johnson, 1997; Yang and Nisonson, 1995; Aichroth, Patel, and Moyes, 1991; Timoney, Kneisl, Barrack, et al., 1990). Among these, OA severity was rated only with arthroscopy in three studies, with arthroscopy combined with preoperative information in one; and with radiography and arthroscopy separately in two. A useful function of case series is to suggest patient populations that may be worthwhile to include in controlled trials. While the Moseley trial found no differences in treatment effect by patient characteristics, case series evidence of different outcomes by age, presence of mechanical symptoms and OA severity should be noted by investigators analyzing future RCTs, but it cannot be viewed as showing that arthroscopy is particularly effective in particular subgroups.
| Study | Age | % Female | Pain | Function |
|---|---|---|---|---|
| Forster and Straw, 2003 | ALD: mn 63 | VAS | Knee Society: | |
| ALD vs. IA Hyalgan | Hyalgan: mn 60 | ALD: mn 7.5 | ALD: mn 45 | |
| Hyalgan: mn 7.6 | Hyalgan: mn 65 (p<0.05) | |||
| LI: | ||||
| ALD: mn 13 | ||||
| Hyalgan: mn 10.5 |
| Study | Interventions | Prior Treatments | Concurrent Treatments |
|---|---|---|---|
| Forster and Straw, 2003 | ALD: general or spinal anesthesia; saline lavage; debridement of articular surface or menisci as considered necessary at surgeon's discretion; large chondral or meniscal flaps excised but stable, degenerative menisci left intact | ||
| ALD vs. IA Hyalgan | IA Hyalgan: any effusion aspirated; 5 injections of 20 mg Hyalgan in affected knee at 1-wk intervals |
| Study | Initial Assembly of Comparable Groups | Low Loss to Followup, Maintenance of Comparable Groups | Measurements Reliable, Valid, Equal* | Interventions Comparable/Clearly Defined | Appropriate Analysis of Results | Overall Rating |
|---|---|---|---|---|---|---|
| Forster and Straw, 2003 | ? | Y | N | Y | N | Poor |
| ALD vs. IA Hyalgan |
| Study | Outcomes | |||||||
|---|---|---|---|---|---|---|---|---|
| Forster and Straw, 2003 | Group | n | Outcome | 6 wk mn | 3 mo mn | 6 mo mn | 1 yr mn | p values |
| ALD vs. IA Hyalgan | ALD | 15 | VAS | 5.4 | 6.0 | 6.2 | 5.7 | all NS |
| Hyalgan | 17 | (higher=worse) | 6.6 | 6.0 | 5.4 | 5.7 | ||
| ALD | Knee Society | 55 | 45 | 45 | 55 | all NS | ||
| Hyalgan | (higher=better) | 70 | 65 | 80 | 90 | |||
| ALD | LI | 10 | 13 | 12 | 10.5 | all NS | ||
| Hyalgan | (higher=worse) | 11 | 11 | 9 | 8 | |||
| Further surgery: arthroscopy (ALD 29%, Hyalgan® 0%); total knee arthroplasty (ALD 12%, Hyalgan ® 7%); total knee arthroplasty waiting list (ALD 18%, Hyalgan® 13%) | ||||||||
The Forster and Straw trial found no differences between Hyalgan® and arthroscopic lavage and debridement over a 1-year followup. However, the trial was clearly underpowered and had significant baseline differences between arms with no adjustment for such in the data analysis. Forster and Straw represent the only study making direct comparisons among viscosupplements and arthroscopic treatment; no studies compared glucosamine or chondroitin with the former treatments. This trial provides an inadequate evidence base to form conclusions about the comparative effects of viscosupplements and arthroscopy.
What are the Clinical Effectiveness and Harms of Arthroscopic Lavage and Debridement in Patients With Primary OA of the Knee?
The best available evidence, a single placebo-controlled RCT, found arthroscopic lavage with or without debridement was not superior to placebo. The evidence base does not definitively show that arthroscopy is no more effective than placebo. But additional RCTs of high quality and with favorable would be necessary to refute the existing trial, which suggests equivalence between placebo and arthroscopy.
Neither the placebo-controlled RCT, published by Moseley, O'Malley, Petersen, et al., in 2002, nor other studies distinguished between primary and secondary OA. However, due to the age of patients, it is likely most patients had primary OA.
No other study besides Moseley, O'Malley, Petersen, et al. (2002) addressed the potential contribution of placebo effects to apparent improvement in outcome after arthroscopy.
The primary limitations of the Moseley, O'Malley, Petersen, et al. (2002) trial are lack of details describing the patient sample, the use of a single surgeon and enrollment of patients at a single Veterans Affairs Medical Center. These concerns call into question the generalizability of this trial's findings.
Since OA of the knee affects a large population, uncertainty about arthroscopy's effectiveness should be resolved with further well-conducted and well-reported RCTs.
Major methodologic shortcomings in non-placebo RCTs, an administrative database analysis and case series preclude resolution of uncertainties raised by the trial of Moseley, O'Malley, Petersen, et al. (2002).
Evidence on the harms after arthroscopic lavage and debridement comes primarily from an administrative database analysis and case series reports. Potential harms include infection, prolonged drainage from arthroscopic portals, effusion, hemarthrosis, and deep vein thrombosis. To determine whether the risk of such harms is acceptable, it is important to establish whether the effectiveness of arthroscopic lavage and debridement surpasses placebo.
What are the Clinical Effectiveness and Harms of Arthroscopic Lavage and Debridement in Patients With Secondary OA of the Knee?
We identified no studies that enrolled patients with only secondary OA of the knee, or that reported separately on secondary OA of the knee. Therefore, no conclusions can be drawn about treatment outcomes in patients with secondary OA of the knee.
How do the Short-Term and Long-Term Outcomes of Arthroscopic Lavage and Debridement Differ by the Following Subpopulations: Age, Race/Ethnicity, Sex, Primary or Secondary OA, Disease Severity and Duration, Weight (Body Mass Index), and Prior Treatments?
Subgroup analyses for mechanical symptoms, alignment and OA stage were performed in the placebo-controlled RCT by Moseley and colleagues. No differences in results were observed within subgroups. Thus, it cannot be concluded that arthroscopic lavage with or without debridement has effects greater than placebo for specific subgroups.
Subgroup analyses were also performed in a quasi-experimental study, an administrative database and several case series. In these studies, different outcomes were observed according to age, presence of mechanical symptoms and severity of OA. However, since these studies had substantial methodologic flaws so it cannot be concluded that arthroscopy has greater effectiveness in specific patient subgroups.
How do the Short-Term and Long-Term Outcomes of Arthroscopic Lavage and Debridement, Viscosupplements and Glucosamine/Chondroitin Compare for the Treatment of: Primary OA of the Knee; and Secondary OA of the Knee?
A single RCT compared use of arthroscopic lavage and debridement with intra-articular Hyalgan®. This poor quality study analyzed data from only 32 patients, finding no significant differences between groups on 3 scales concerned with pain and function.
This trial provides an inadequate evidence base to form conclusions about the comparative effects of viscosupplements and arthroscopy.
No other comparative study, randomized or nonrandomized, addressed the relative effects of arthroscopic lavage and debridement, viscosupplements, and glucosamine/chondroitin.
Osteoarthritis (OA) of the knee is a common condition and the three interventions reviewed in this report are widely used in the treatment of OA of the knee. Yet the best available evidence reports that glucosamine/chondroitin and arthroscopic surgery are no more effective than placebo. The Glucosamine/Chondroitin Arthritis Intervention Trial (GAIT) (n=1,583) found that neither glucosamine hydrochloride, chondroitin sulfate, nor the combination was superior to placebo and that all were inferior to celecoxib. The double-blind, randomized, controlled trial by Moseley, O'Malley, Petersen, et al. (2002, n=180) found that arthroscopic lavage with or without debridement was not superior to sham arthroscopy. Results from 42 randomized controlled trials (RCTs), all but one of which were synthesized in various combinations in six meta-analyses, generally show positive effects of viscosupplementation on pain and function scores compared to placebo. However, the evidence on viscosupplementation is accompanied by considerable uncertainty due to variable trial quality, potential publication bias, and unclear clinical significance of the changes reported.
Are we to conclude, then, that all three interventions are ineffective? It is erroneous to conclude that “no evidence of effect” is the same as “evidence of no effect.” The distinction between no evidence and no effect applies somewhat differently to each intervention.
While the overall results of GAIT show no benefit, in the subgroup of knee OA patients with moderate-to-severe pain at baseline, the combination of glucosamine hydrochloride and chondroitin sulfate significantly improved pain. Although this subgroup analysis was not explicitly prespecified in the GAIT protocol, the stratified randomization by disease severity yields statistically valid comparisons. The nonsignificant statistical result in the celecoxib arm in the same patient subgroup may be a function of insufficient power. Given the small number of patients in the moderate-to-severe subgroup, and the large number of such patients in the general population, a further trial can be justified. These subgroup results, although suggestive, do not override the overall results of GAIT, which must stand unless equally compelling evidence of benefit to a selected subgroup is produced.
The existing evidence does not definitively show that arthroscopic lavage with or without debridement is only as effective as placebo. However, additional placebo-controlled RCTs showing clinically significant advantage for arthroscopy would be necessary to refute the Mosley results, which show equivalence between placebo and arthroscopy. The recently published (Weinstein, Tosteson, Lurie, et al., 2006) Spine Patient Outcomes Research Trial (SPORT) offers an alternative study design that could be informative, a rigorous RCT comparing surgery to conservative management, rather than sham.
The existing evidence leaves uncertainty whether viscosupplementation achieves minimal clinically important improvement compared to placebo. Higher-quality trials are in the minority and show smaller effects; there are numerous patients lost to follow-up, and a substantial portion of studies (25 percent of total patients) have not been published as full-text articles. The clinical significance of reported changes in pain and function scores is uncertain, as almost all studies compare only mean difference between arms. Although the overall pooled estimate suggests that hylan G-F 20 may have a larger effect than other hyaluronans, whether this represents a meaningful clinical effect or limitations in the quality and completeness of study reporting is unknown. A rigorous RCT that showed strong evidence of improvement in pain and function would be necessary to conclude that viscosupplementation is beneficial.
Overall, our recommendations for future research reach beyond the specific treatments addressed in this report, and are intended broadly to improve the quality of research and reporting on interventions for osteoarthritis of the knee.
Clinically meaningful results require outcome measures establishing that patients experience improvement that is important to them—meaningful clinically important improvement. The range of magnitude of improvement clinically important to patients has been estimated for VAS pain and WOMAC measures, while to a lesser degree for the Lequesne Index (see Methods). Few RCTs reported results in terms of response: the proportion achieving a meaningful clinically important improvement in pain and function. The vast majority of trials compared only mean change between groups. Follow up duration and intervals for measurement, appropriate to each intervention, should be established by expert consensus.
Common measures and intervals will produce a more robust body of cumulative evidence and improve the ability to compare and pool results among trials. As a result of the variety of measures and intervals used in primary studies, meta-analyses available for this type of evidence often report pooled outcomes as the standardized mean difference, a statistical construct that lacks meaning to clinicians and patients.
Among RCTs of viscosupplementation, those that have not been published in full-text comprise approximately 25 percent of the total patient population. Several meta-analyses of glucosamine report that trials of the Rotta product, glucosamine sulfate, show outcomes superior to trials of glucosamine hydrochloride. Yet key studies that provide some of the data supporting superior efficacy have not been published as full-text studies. Existing studies should be published in full. Finally, all trials should be registered at inception at ClinicalTrials.gov along with anticipated date for full release of results.
Our evidence report draws heavily on six study level meta-analyses of glucosamine/chondroitin and five of viscosupplementation. While we used a validated instrument to appraise the quality of the systematic reviews, the instrument does not address the question of when meta-analysis is appropriate to a systematic review. Meta-analysis is a technique with underlying assumptions that may or may not hold when a particular collection of results are pooled. Furthermore, meta-analyses may fail to convey the real uncertainty and potential bias accompanying pooled estimates.
In many respects, the focus on meta-analysis in the systematic reviews available for this evidence report, served to obscure the overall weakness of the primary literature. For example, the Oxman and Guyatt meta-analysis quality assessment tool asked if conclusions made by authors were supported by the data. However, the tool does not adequately address whether quality concerns of the underlying literature were incorporated into conclusions, which was a frequent flaw in the meta-analyses reviewed here. Building on the Oxman and Guyatt tool, Shea, Grimshaw, Wells et al. 2007 have developed a new scale which more clearly assesses whether conclusions took appropriate account of the quality of included studies and the potential for publication bias.
For RCTs of both glucosamine/chondroitin and viscosupplementation, potential sources of bias included lack of reporting intention-to-treat results, high drop-out or loss to follow-up rates, poor quality, and lack of a priori sample size calculations. A number of these characteristics were noted by meta-analysts to influence results.
Uncertainty in the magnitude of effects pooled is influenced by factors intrinsic to the underlying trials. Among these are variable patient characteristics, trial characteristics, and the indication that a few trial results were outliers and influential on pooled estimates. The meta-analyses frequently reported high inter-trial heterogeneity. Random effects models were used in the face of high heterogeneity, but a consequence is to increase the influence of smaller trials on the pooled results. The meta-analyses did not address a threshold question, one that has not been clearly resolved by practitioners of meta-analysis: when is heterogeneity too high to justify pooling trial results. A related concern is the practice of reporting on multiple outcome measures and time intervals, which may be represented by a small portion of studies, thus potentially introducing bias.
| Viscosupplementation | Glucosamine/Chondroitin | Arthroscopy | |
|---|---|---|---|
| Evidence (What is the current state of the evidence?) | Current evidence consists largely of trials with high loss to follow-up and lack rigorous measurement to test whether intra-articular hyaluronans achieve meaningful clinically important improvement in pain and function. The evidence does not clearly demonstrate that intra-articular hyaluronans achieve clinically significant improvement in pain and function compared to placebo. | Based on GAIT, neither glucosamine, chondroitin or their combination provide meaningful clinically important improvement in pain or function. | A single placebo-controlled RCT found arthroscopic lavage with or without debridement to be equivalent to placebo. |
| A rigorous multi-center RCT, preferably with independent sponsorship, is needed to either establish or refute whether hylan G-F 20 is beneficial. | A subgroup analysis found that the combination of glucosamine hydrochloride and chondroitin sulfate significantly improved pain in patients with moderate-to-severe OA of the knee. Given the small number of patients in the moderate to severe subgroup, and the large number of such patients in the general population, confirmation in a large, rigorous multi-center RCT, preferably with independent sponsorship, is desirable. | Adverse events have not been systematically studied. | |
| Adverse events, reportedly uncommon, have not been systematically studied. | No conclusions concerning metabolic effects of chronic glucosamine use in the general population can be drawn. | ||
| Population (What is the population of interest?) | Individuals with OA of the knee of varying severity. Future trials should be accompanied by stratified randomization according to disease severity and duration. | Individuals with moderate-to-severe OA of the knee. Inclusion of diabetic individuals with metabolic testing and long-term observational follow-up. | The target population consists of patients with clinically diagnosed OA of the knee and who have tried conservative treatments with transient or unsatisfactory results. |
| Intervention (What are the interventions of interest?) | Pooled estimate suggests effect obtained with hylan G-F 20 may be larger than with other hyaluronans, whether this represents a meaningful clinical effect or study limitations is unknown. |
| Arthroscopic lavage, with or without debridement, |
| ? | unknown; unclear |
| 1° | primary |
| 2° | secondary |
| A | arthroscopy |
| Acet | acetaminophen |
| ACR | American College of Rheumatology |
| ADL | Activities of Daily Living |
| ADL | arthroscopy, lavage, and debridement |
| AE(s) | adverse events |
| AL | arthroscopy and lavage |
| ARA | American Rheumatism Association |
| BMI | body mass index |
| CI | confidence interval |
| D | debridement |
| dis | disease |
| FE | fixed effects |
| GH | glucosamine hydrochloride |
| GS | glucosamine sulfate |
| HSS | Hospital for Special Surgery |
| IA | intra-articular |
| ITT | intention-to-treat |
| JSN | joint space narrowing |
| K-L | Kellgren-Lawrence |
| L | lavage |
| LI | Lequesne Index |
| MA(s) | meta-analysis(es) |
| mn | mean |
| mo(s) | month(s) |
| N | number |
| n | number |
| N | no |
| NR | not reported |
| NS | nonsignificant |
| NSAID(s) | nonsteroidal anti-inflammatory drug(s) |
| NSD | no significant difference |
| OA | osteoarthritis |
| OAK | osteoarthritis of the knee |
| OMERACT-OARSI | Outcomes Measures in Rheumatology Clinical Trials-Osteoarthritis Research Society |
| Pl | placebo |
| PT | physical therapy |
| pts | patients |
| RCT(s) | randomized, controlled trial(s) |
| RE | random effects |
| rng | range |
| RR | relative risk |
| sd | standard deviation |
| SEM | standard error of the mean |
| SMD | standardized mean difference |
| Tx | treatment |
| USPSTF | U.S. Preventive Services Task Force |
| VAS | visual analog scale |
| WMD | weighted mean difference |
| WOMAC | Western Ontario and McMaster Universities Osteoarthritis Index |
| Y | yes |
| yr(s) | year(s) |
MEDLINE® (through March 29, 2007)
EMBASE (through March 16, 2006)
Cochrane Controlled Trials Register (through November 27, 2006)
EMBASE was updated with abbreviated searches through November 27, 2006.
Database Search Strategies:
“osteoarthritis, knee”[MeSH] OR
“osteoarthritis”[MeSH] AND (knee(tw) OR knees(tw)) OR
osteoarthritis*(tw) AND (knee(tw) OR knees(tw)) OR
“osteoarthritis”[MeSH] AND patellofemoral (tw)
AND
human (limit/tag)
Results of the above search were limited to citations also identified by the Cochrane Handbook search strategy for controlled trials (Alderson et al. 2004):
randomized controlled trial [pt] OR
controlled clinical trial [pt] OR
randomized controlled trials [mh] OR
random allocation [mh] OR
double-blind method [mh] OR
single-blind method [mh] OR
clinical trial [pt] OR
clinical trials [mh] OR
“clinical trial” [tw] OR
((singl* [tw] OR doubl* [tw] OR trebl* [tw] OR tripl* [tw]) AND (mask* [tw] OR blind* [tw])) OR
placebos [mh] OR
placebo* [tw] OR
random* [tw] OR
research design [mh:noexp] OR
comparative study [mh] OR
evaluation studies [mh] OR
follow-up studies [mh] OR
prospective studies [mh] OR
control* [tw] OR
prospectiv* [tw] OR
volunteer* [tw])
For glucosamine and chondroitin, the results of the above search were combined with the results of a search using:
“Glucosamine”[MeSH] OR “Chondroitin”[MeSH] OR
glucosamine(tw) OR
acetylglucosamine(tw) OR
“n-acetylglucosamine”(tw) OR
“n-acetyl-d-glucosamine”(tw) OR
chondroitin(tw)
For hyaluronic acid, the results of the first search above were combined with the results of a search using:
“Hyaluronic Acid”[MeSH] OR
“sodium hyaluronate”(tw) OR
hyaluronan(tw) OR
hyaluronic(tw) OR
hylan(tw) OR
hyalgan(tw) OR
synvisc(tw) OR
orthovisc(tw) OR
euflexxa(tw) OR
supartz(tw) OR
nuflexxa(tw) OR
viscosupplement*
For arthroscopy, the results of the first search above were combined with the results of a search using:
“Arthroscopy”[MeSH] OR
arthroscopy(tw) OR
arthroscopic(tw) OR
arthroscope(tw)) OR
lavage(tw) OR
debridement(tw)
| AO | arthroscopic procedure other than lavage and debridement |
| CS | case series |
| FEW | too few subjects (< 50 for arthroscopy case series) |
| FLA | foreign language article |
| FNA | foreign language, no abstract |
| NDE | not correct study design |
| NPD | no primary data |
| NRA | narrative review article |
| NRD | non-relevant disease |
| NRQ | non-relevant study question |
| RCT | randomized controlled trial |
| ARTH | arthroscopy |
| GC | glucosamine/chondroitin |
| VS | viscosupplementation |
Free Full text in PMC]