This publication is provided for historical reference only and the information may be out of date.
Figure 2 shows the number of citations retrieved in the search from bibliographic and grey literature sources. From an initial 46,884 citations from the seven databases, 2,380 were duplicates. Following the initial screen of title and abstract by two reviewers, 36,031 studies were excluded, indicating that the citation was any of the following: (1) a commentary, editorial or narrative review, (2) not published in English; or, (3) not focused on the treatment of depression. At the next level of title and abstract screening, an additional 8,473 citations were excluded as they were: (1) not a primary study, systematic review, or guideline, (2) not a population with major depressive disorder (MDD), dysthymia, or subsyndromal depression; or, (3) evaluated only electroconvulsive therapy, transmagnetic stimulation, or vagal nerve stimulation as treatments for depression. A total of 3,147 citations were then screened at full text. Figure 2 details the reasons for exclusion at full text. An additional 1,682 citations were derived from grey literature sources and reviewed for relevancy. Systematic reviews published from 2005 forward were screened for potentially relevant citations that may not have been captured by the search. Forty-four primary studies (74 publications)42,44,79–150 were eligible for adults and adolescents. Twenty-seven guidelines in 33 publications13,14,53–55,151–169,169–177 were eligible.
Publications that presented subgroup analyses, secondary analyses, re-analyses, results of different outcomes (not a primary outcome measure), or results for different time points on the same study cohort were considered to be secondary records (or companion publications) to the original studies. For example, there are multiple analyses and publications related to a single study cohort from the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study. In this study, subjects were evaluated prospectively (level 1) for response to a selective serotonin reuptake inhibitor (SSRI), and those who did not respond were followed forward for seven new treatment arms (level 2). Those that failed level two treatments received additional treatments up to level five. STAR*D is included here as a single study with 15 eligible publications; an additional 44 publications based on the STAR*D cohort were excluded because they described results from level 1 only (the prospective evaluation of citalopram efficacy) or because they were overview summaries.
Full text screening identified 148 studies with an appropriate design, but for which only a proportion of the sample comprised subjects that were initially treated with an SSRI. In most studies, initial treatment consisted of a variety of possible antidepressants that included, but were not limited to, SSRI medications. The corresponding authors of these studies were contacted by email and asked to provide data stratified for the subgroup treated with an SSRI. Seven authors of 10 publications from 6 studies79,94,95,105,108,109,115,126,127,131 provided additional information specific to the SSRI failed subjects and these data are reported in this review. For the remaining 138 studies, 22 authors indicated that they could not provide SSRI failed subject results, 93 did not respond to email contact by the specified cut-off date, and for 23, contact information could not be found Appendix C provides a list of excluded studies and the reasons for their exclusion.
We present the review findings by Key Question (KQ) and further stratified by adults and adolescents.
KQ1. Among adults and adolescents with Major Depressive Disorder, Dysthymia, and Subsyndromal Depression who are started on an SSRI and who are compliant with treatment but fail to improve either fully, partially, or have no response, what is the benefit (efficacy or effectiveness) of monotherapy and combined therapy?
KQ1a. How does the efficacy/effectiveness vary among the different monotherapies and combined therapies?
Forty-four unique studies were eligible for KQ1. Forty-one studies (61 publications)44,79–137,150 included adults and three studies (13 publications)42,138–149 included adolescents. With respect to the studies that evaluated adults, there were five studies107,150,178–180 for which results are not presented, and one for which on partial results are presented.112 As well, seven of the STAR*D studies were not extracted: three studies181–183 presented results for subjects after two or three additional treatment modifications following the first treatment failure (treatment levels three to five), three studies did not present data specific to any treatment,184–186 and one study187 presented cost outcomes based on modeling rather than actual cost data. Four studies did not have data extracted178–180 or were only partially extracted,112 as they included treatment protocols that evaluated prospective failure to subsequent nonSSRI or combination therapies prior to randomization to a new treatment (similar to level 3 and beyond in the STAR*D cohort), or participants were recruited because of previous failures including nonSSRI treatments.180 The authors of these studies, and the authors from three STAR*D publications181–183 were contacted and asked for results specific to the stream of patients that had failed an SSRI prior to the switch to the new intervention being tested; all authors responded and indicated that this information was not available. There were two additional studies107,150 that were not extracted: these studies used withdrawal designs (maintenance trials), in which subjects who had successfully responded to the combination of an SSRI and an augmenting agent were then randomized to maintain the current treatment or to switch back to monotherapy.
Similarly, from the three eligible studies that evaluated adolescents, two studies (10 publications) had data that could be extracted.42,138,140,141,144–149 One study139,142,143 (3 publications) called the Treatment of Adolescents Study (TADS), indicated that some subjects in the pharmacological arm were evaluated beyond the first prospective failure to an SSRI (phase 2 and 3), but did not present these results. The author was contacted and there was confirmation that this information was not available.
Treatment in Adults Who Have Had Inadequate Response to an SSRI
When altering doses or switching to a different monotherapy in patients with inadequate response to an SSRI:
- There is low quality evidence to determine whether a switch to a nonSSRI antidepressant is different than a switch to another SSRI to affect response and remission.
- There is low quality evidence to determine if a switch to a nonpharmacological treatment is different than a switch to another antidepressant.
- There is low quality evidence that increasing the dose of an antidepressant is different to maintaining standard doses to affect response and remission.
- The majority of studies did not design their protocols to test superiority.
Augmenting therapies in patients with inadequate response to an SSRI:
- There is low quality evidence that response and remission rates following switch to a different antidepressant (monotherapy) are comparable to the addition of another treatment (combined therapy) in patients with inadequate response to treatment with an SSRI.
- There is low quality evidence that the addition of an atypical antipsychotic medication is different to the addition of a placebo in patients who have had an inadequate response to an SSRI.
- There is insufficient evidence that the addition of other augmenting agents is superior to the addition of a placebo in patients who have had an inadequate response to an SSRI.
Combined therapies in patients with inadequate response to an SSRI:
- There is insufficient evidence that any combined therapy was superior to any other to affect response and remission.
Studies to date include a restricted range of patients, with a preponderance of white women between the ages of 40 and 50 and a relatively large number of past depressive episodes.
There were 41 studies evaluating adults and all but two included subjects with MDD; one study evaluated subjects with subsyndromal depression,129 and one, subjects with dysthymia.117 As noted previously, five studies107,150,178–180 and seven STAR*D publications181–187 did not have data that could be extracted. Additionally, three STAR*D publications81,99,102 and two studies115,116 present results on predictors of failed response in the population of interest and these are presented in KQ4. We present the study results for the eligible and extracted studies based on the type of treatment comparisons as follows: (1) monotherapy compared with monotherapy; (2) monotherapy compared with combined therapy; and, (3) combined therapy compared with combined therapy. Some studies evaluated more than two treatment arms, and presented monotherapy compared with monotherapy results, as well as monotherapy compared with combined therapy. As such, some studies are included in multiple sections.
Monotherapy Treatment Compared With Monotherapy Treatment in MDD
Overview of Study PICOTa Characteristics
There were twelve studies (18 publications)44,81,85,88,90,99,100,102–104,106,110–112,115,119,121,122 that compared monotherapy interventions in subjects who had failed to respond to an SSRI. Three studies85,90,119 evaluated a switch to another antidrepessant. Five studies88,112,119,121,122 had three treatment arms for which two arms compared single interventions directly. The STAR*D study44,81,99,100,102,110,111 (labeled as level 2 subjects within this study), evaluated four monotherapy interventions and one treatment included cognitive behavioral therapy (CBT). One study (two publications)103,106 evaluated two methods of switching to the same anti-depressant. Two studies104,119 compared dose escalations of an SSRI. One study115 included subjects who failed to respond to SSRI and nonSSRI antidepressants and the author subsequently provided some results specific to the failed SSRI subgroup.
In total, there were 2,611 participants in treatment arms evaluating single interventions within these 12 studies. The sample size in these studies varied from 18122 to 789;44 the sample sizes per treatment arm varied from eight122 to 250.44 Six studies (9 publications)44,85,88,90,100,103,106,110,119 exceeded a total sample size of 101 and one study122 had less than 30 subjects. For one study115 that had a mixed sample of response failures, 58 of 77 subjects were from the SSRI failure subgroup.
Women were the majority of subjects in all studies and female gender distributions varied from 60 to 69 percent44,81,85,88,90,99,100,102,104,110,111,119 to greater than 70 percent.103,106,121,122 One study112 reported gender characteristics for a larger sample (n=131) but not for the subgroup extracted for this review (n=41). Another study115 reported characteristics for a mixed sample (failure to respond to an SSRI and nonSSRI) and showed a larger proportion of women (approximately double). No studies reported either significant main effects of gender or significant interactions between gender and response rates across treatment groups.
The racial composition was predominately white race and varied from 60 percent,104 80 percent,44,100,110,111 90 percent,90 to 100 percent.103,106 Five studies did not report ethnicity.85,112,115,119,121 Although generally not evaluated, there were no differential patterns of response noted to be based on ethnicity.
Characteristics of the “Inadequate Response” for Enrollment
Table 1 shows the manner in which failure to respond to an SSRI was established in the reported studies. Four studies determined failure retrospectively and study subjects were on an SSRI at the time of entry into the trial.85,90,103,106,121 Where inadequate response to the SSRI was determined prospectively, fluoxetine,88,122 citalopram,44 paroxetine,104,112 and sertraline119 were the SSRIs for which failure was established. No study evaluated subjects specifically for failed response to escitalopram or fluvoxamine alone.
Two studies112,119 excluded subjects with a history of failure over a two week period to any intervention (antidepressant or augmenting agent) used in the current study. One study103,106 excluded subjects with a lack of response in the current episode to a serotonin–norepinephrine reuptake inhibitor (SNRI). This study evaluated two methods of switching from an SSRI to duloxetine (an SNRI). One study90 excluded patients who had previously taken venlaxafine, another study85 excluded subjects who had failed to respond to citalopram or venlaxafine and one study104 excluded subjects who had failed to respond to paroxetine.
Mental Health Histories of Study Participants
Four studies, using the Hamilton Depression scale (HAMD) 17-item version, reported mean baseline scores that varied from 19 (SD 7.3),44 21 to 23 (SD 3.3 to 3.9),103,106 and 24 to 28.104,121 One study119 reported median HAMD scores of 23 (range 18 to 37). Another study90 used the HAMD 21, and baseline scores varied from 22 to 23. One study122 reported only that the minimum severity for eligibility was a HAMD 21 item score of 20 or greater. Two studies85,88 reported Montgomery-Åsberg Depression Rating Scale (MADRS) mean score of 30 to 31. One study112 reported baseline scores for a larger sample (n=131) but not the subgroup of interest (n=41).
The number of previous depressive episodes varied from a median of one episode (range zero to eight),112 two episodes (range zero to 35),119 or seven to eight episodes (range 12 to 15) in the STAR*D cohort.44 One study reported that approximately 72 percent of the study subjects had had at least one previous episode of depression.103,106 Another study88 reported that 45 percent of the olanzapine group and 79 percent of the fluoxetine group of study subjects had had three or more lifetime episodes. One study85 had approximately 39 percent of subjects with no previous failures and 33 percent that had greater than 3 previous failures; another study had approximately 60 percent of subjects who had failed during their first episode of depression.104 Two studies90,122 did not report the number of previous episode failures. A single study reported the proportion of subjects with recurrent depression as 75 percent.44 Two from four retrospective studies90,121 described how previous episode failures were defined; previous failures were defined as those that required treatment with antidepressants. None of the studies specified how information on previous episodes was captured (e.g., by patient report, medical record, etc.).
Length of the current episode was reported as a median value in three studies and a mean in four studies. Two studies indicated the proportion of subjects over various time intervals in weeks85 or months and years.104 One study did not report the mean length of the current episode.122 Median values for length of the current episode varied from eight weeks (range 2 to 52 weeks),112 and 16 to 20 weeks (range 0 to 960 weeks).119 Mean values varied from 28 to 32 weeks (range 0 to 42 weeks),103,106,121 52 to 61 weeks (range 52 to 86 weeks),88,90 and 118 weeks (SD = 264 weeks).44 No study specified the manner for collecting the length of the current episode.
No study in this grouping reported baseline use of complementary and alternative medicines (CAM) at baseline or endpoint. One study excluded subjects who had used St. John’s Wort within the preceding 14 days.90
Intervention and Comparators
Ten studies were labeled as randomized controlled trials (RCT); however, the STAR*D cohort had a small proportion of subjects who accepted the randomized arm and as such we classify this as a controlled clinical trial (CCT). The number of treatment arms varied from two103,106 to four.44 Six studies had a prospective run in phase; the length of this phase varied between 4,112 6,104,119,122 8,88 and 12 weeks.44 Two studies included a washout period before switching to the new intervention; one study90 had an interval of 14 days (28 for fluoxetine) and another85 had an optional interval of 4 to 7 days placebo.90 Patient adherence was evaluated in only three studies; one study evaluated this as the number of pills consumed (varied from 94 to 97 percent adherence),88 another as not maintaining therapeutic drug monitoring (78 percent adherence),112 and one study evaluated pill counts and anamnesis.104
Table 2 shows the comparison and treatment interventions for the studies evaluating monotherapy. The monotherapy included: (1) dose escalation with or without switching from an SSRI; (2) switch to another SSRI; (3) switch to an SNRI; (4) method of switching to an SNRI; (5) switch to an antidepressant; and (6) switch to an augmenting agent. There were three studies that evaluated dose escalations in sertraline (100mg/d and 200mg/d),119 paroxetine (20mg/d and 30 mg/d),104 and venlaxafine (mean dose 148mg/d and 309mg/d).90 Two of these dose comparison trials90,104 followed switches from another SSRI.
Two studies evaluated a switch to sertraline, which represented treatment with a different SSRI. One study44 switched from citalopram to sertraline and used a maximal dose of 200mg/d (titrated from 50mg/d), and one study119 compared two doses of sertraline (100mg/d and 200mg/d). Another study switched from a noncitalopram SSRI to citalopram.104
Four studies (five publications)44,85,103,106,112 evaluated a switch to the SNRI venlaxafine, venlaxafine extended release, bupropion, or duloxetine. Doses for venlaxafine varied from 37.5 to 375mg per day (extended release).44,85,90,112
Two different methods of switching from the current SSRI to the new medication, duloxetine, were evaluated in one study (two publications),103,106 and as such the dose of 60mg per day was the same for both treatment arms. One study44 evaluated the use of sustained release bupropion at a maximal dose of 400mg per day (titrated from 150mg per day). This same study had treatment arms for venlaxafine and sertraline.
Two studies88,122 compared maintenance fluoxetine treatment to olanzapine monotherapy; the doses of fluoxetine were 50mg per day in one study88 to a range of 20 to 60mg per day in the second study.122 Olanzapine dosages ranged from 6 to 18mg per day in one study88 and 5 to 20mg per day in the second study.122 Another study121 evaluated mianserin at a dose of 60mg per day.
Three studies indicated that remission was the primary outcome, defined as a MADRS total score of less than 10,112 HAMD-17 score less than 7,44,104 or the Quick Inventory of Depressive Symptoms Self Report (QIDS-SR-16) score less than 5.44 Three studies indicated that the primary outcome was response based on a 50 percent reduction in the HAMD-17119 or HAMD-21.104,115 Five studies indicated that efficacy (as measured by a change in score and differences between groups) was the primary outcome, assessed using the HAMD-17,103,106 HAMD-2185,90 or MADRS scores.88,122 One study121 indicated that both response and remission as determined by the HAMD-17 were primary outcomes. All studies reported proportions of response (50 percent change relative to baseline) or remission (based on primary outcome threshold scores for specific instruments). Few studies evaluated outcomes other than response and remission (see efficacy section below).
Timing of the Interventions
Table 3 details the run-in and treatment intervals for the studies comparing monotherapy treatments. The majority of studies evaluated response to the new treatment for six weeks or greater. Similarly, the majority of studies evaluated prospective failure for six weeks or greater.
The studies comparing monotherapies were conducted in Europe (Spain, Italy, France, and the United Kingdom),103,106 Switzerland,112 Denmark and Iceland,119 France,121 the Netherlands,104,115 Australia,85 Canada,88 and the United States (three studies, nine publications).44,81,90,99,100,102,110,111,122
Study participants were all recruited from outpatient psychiatric settings44,88,103,106,112,119,122 and outpatient primary care.44,90,104 Two studies recruited from both outpatient and inpatient psychiatric settings.85,121 One study recruited subjects from an inpatient setting.115
Risk of Bias
Figure 3 shows the distribution of the evaluation for risk of bias using thirteen criteria. None of the seven studies clearly described the method of randomization. All studies were at low risk from biases associated with compliance to treatment, selective outcome reporting, showing reasons for dropouts, or for balancing of important prognostic factors at baseline. For the remaining criteria, only half the studies were at low risk of bias, particularly for randomization. In particular, the role of the funding agency was not specified in half the studies. All but two studies44,104 were funded by a pharmaceutical company with a financial interest in one of the drugs under investigation.
Three studies indicated that there was a washout period. One study90 included a 14 day (28 days for fluoxetine) washout period before randomization to new interventions; a second study85 allowed for an optional 4- or 7-day washout placebo. A third study115 had a one week washout from fluovoxamine. Lack of washout in the studies with olanzapine and fluoxetine88,122 may be problematic, as fluoxetine has a long half life (approximately 4 weeks) and the participants are therefore essentially on cotherapy for at least several weeks, even if they are only having olanzapine administered. Most SSRIs have a half life of no more than five days and any very early side effects from the new treatment could actually represent withdrawal from the SSRI, if in fact subjects were being switched. As a group, these monotherapy studies are considered to have moderate risk of bias given that half of the “risk of bias” items were not met or there was uncertainty.
Efficacy of Monotherapy Versus Monotherapy Treatments
Outcomes of Response and Remission
Overall, none of the therapeutic approaches (dose change with or without medication switch, or switch to different antidepressant, augmenting agent, or nonpharmacological therapy) showed any advantage over any other.
Table 4 shows the rates of response and remission for all monotherapy comparisons, but there are some limitations in directly comparing response and remission rates due to varying definitions across studies. As noted previously, all but one study90 comparing various monotherapies defined “response” as a 50 percent change (improvement) relative to baseline for either the HAMD or MADRS; two studies had minor variations to this definition including: 1) 50 percent reduction and a score of 14 on the HAMD-17;115 and, 2) a 50 percent reduction on the HAMD or MADRS and CGI improvement (level 1 or 2).85 One study defined response as a reduction in the HAMD score equivalent to the “decrease pretreatment.”90 One study reported only response and remission rates for a subgroup of patients with a baseline score of greater than 31 on the HAMD-21 but not the total sample.85 Thresholds for remission for studies using the HAMD varied from seven to eight and for the MADRS less than eight or 10.
From the three studies evaluating dose changes, only one trial119 had confidence intervals that did not cross the midpoint, suggesting that the lower dose of 100mg of sertraline plus placebo was superior to 200mg of sertraline plus placebo; response rates of 70 percent compared to 54 percent and remission rates of 38 and 28 percent respectively were reported (Table 4).
In the studies that switched to other antidepressants, none were shown to confer any relative advantage in response and remission rates. As part of the STAR*D trial,44,110 few differences were shown for the outcomes of response or remission when patients were switched from citalopram to either another SSRI (sertraline) or a nonSSRI (bupropion or venlafaxine). Similarly, in this same trial, patients who were switched to another monotherapy medication (subgroup) had comparable rates of response to those that were switched to CBT alone.
Three studies evaluated speed of response. The STAR*D showed no statistically significant differences for monotherapy treatments with respect to speed of response or remission.44,110 A second study88 showed no statistically significant differences between the monotherapy arms with respect to speed of response. The third study90 found no statistical differences between groups with differing doses of venlafaxine.
Only 4 of 12 studies evaluated quality of life outcomes and all studies44,88,103,104,106,110 showed no statistically significant differences in any of these measures between treatment arms. One study103,106 compared Visual Analogue Scale scores for pain (overall and various body parts); although there were no statistical differences between the two methods of switching to duloxetine, there were statistically significant decreases on the Visual Analogue Scale for pain, SF36-bodily pain scale, and the symptom questionnaire somatic scale.
Monotherapy Versus Combined Therapy Interventions in MDD
There were 33 unique studies in 4944,79–84,86–89,91–102,105,108–114,118–124,126–128,130–137 publications that evaluated monotherapy versus combined therapies. Two studies were withdrawal studies and were not extracted.107,150 From the remaining 31 studies, five94,110,127,128,133,135,137 evaluated nonpharmacological interventions combined with SSRI use. For level 2 subjects, the STAR*D study44,81,99,100,102,110,111 evaluated four monotherapy interventions and three combined therapies; the CBT monotherapy and citalopram with CBT arms were compared with pharmacological therapies combined, and the results are presented in the nonpharmacological section below. A single study119 evaluated two doses of an SSRI and the same SSRI in combination with an augmenting agent. There were six studies that had subjects who failed to respond to SSRI and nonSSRI antidepressants and subsequently provided some results specific to the failed SSRI group.79,94,95,105,108,109,126,127,131 The majority of studies compared the use of a single antidepressant to a combined therapy, which included an antidepressant with augmenting agents.
In total there were 4,537 participants in studies comparing monotherapy to combined therapies. The total sample size in these studies varied from 996 to 1,439;44,81,110,111,113 the sample sizes per treatment arm varied from 4 subjects96 to 307.134 Thirteen studies44,79,87,88,92,97,100,101,105,109–111,114,118–121,126,130–135,137 exceeded a total sample size of 101 and 9 studies80,84,89,91,96,122,124,127,128 had fewer than 30 subjects.
Overview of Study PICOT Characteristics
There were two studies that predominately evaluated a single gender. One study evaluated men, as the intervention was testosterone used as the augmenting agent.91 Another study evaluated only women being treated with antidepressants alone or combined with exercise.127 The proportion of women in the other studies varied from greater than 70 percent in 13 studies,44,80,82–84,92,93,96,97,100,110,111,114,119,121,122,124,130,133,135,137 from 61 to 69 percent in 5 studies,87,88,98,101,120,123,132 from 51 to 59 percent in 2 studies,86,89 and from 45 to 49 percent in 2 studies.118,128 One study112 reported gender characteristics for a larger sample (n=131) but not for the subgroup (step 3A to 3C) extracted for this review (n=41).
There were seven studies for which the authors provided some stratified results specific to the subgroup that had failed to adequately respond to an SSRI. However, demographic data were not provided or available. As such, we have assumed that the SSRI failure subgroup are comparable to the whole sample within the study, as they represented over 50 percent of the total sample. When considering the proportion of the study samples who failed to respond adequately to SSRI treatment, there were two studies95,108 where the sample was 55 to 59 percent, 60 to 69 percent in two studies127,134 and greater than 70 percent in three studies.79,94,105,109,126,131 In these seven studies, females represented the majority of the subjects in the following proportions; 1) greater than 80 percent in two studies,94,127 2) from 70 to 79 percent in three studies,79,105,109,126,134 and 3) from 51 to 60 percent in two studies.95,108
In the majority of studies, the proportion of men and women per treatment arm were similar with the exception of one small study89 with 20 subjects, which showed differences between groups greater than 10 percent. Information on racial composition or ethnicity was not reported in 18 studies.82,83,87,89,91,92,94,96–98,101,114,118–121,123,124,127,130,132–135,137 For the remaining studies, the majority of subjects were of the white race comprising between 75 to 89 percent of the sample in six studies44,86,88,95,100,108,110,111,113,128 and greater than 90 percent in seven studies.79,80,84,93,105,109,122,126,131,133–135,137
Mean age for the total samples varied from 40 to 44 years in 11 studies,44,86,88,98,112–114,118–120,122,124 45 to 49 years in 13 studies,79,84,87,89,91–95,97,101,105,108,109,121,126,130–132,134 50 to 54 years in 2 studies,82,83,123 and greater than 60 years in 2 studies.128,133,135,137 One study did not report age characteristics of the very small sample (n=9).96 Two studies reported an age range of 21 to 54,80 and from 40 to 60 years.127
Table 5 shows the manner in which failure to an SSRI had been established. Fifteen studies determined failure prospectively in an open label manner. For the majority of these, the subjects were currently on the same antidepressant to which they had shown a poor response. Fourteen studies determined inadequacy of response retrospectively. For studies where inadequate response was determined prospectively, the SSRI to which failure was established included three studies each for fluoxetine,88,118,122 and sertraline,86,87,101,119,132 and two each for citalopram,113,124 escitalopram,133,135,137 and paroxetine.80,112 Six studies79,95,105,108,109,126,128,131,134,188 used any combination of SSRIs; three studies79,95,105,109,126,131 specified that fluvoxamine was not one of the SSRI evaluated and these same studies also included escitalopram. No studies evaluated subjects specifically for failed response to fluvoxatine alone.
There were nine studies that excluded subjects because of past failures to specific interventions. Five studies excluded subjects who reported two,114 three, or more previous failures.79,80,87,101,105,109,126,131,132 Three studies excluded subjects with a history of failure over a two week period112 or in the recent episode118,119 to any intervention (antidepressant or augmenting agent) used in the current study. Three studies excluded subjects who had an inadequate response to nonpharmacological interventions of electroconvulsive therapy (ECT)79,105,109,126,131 alone or with repetitive transcranial magnetic stimulation (rTMS) and vagal nerve stimulation (VNS)82,83 in a previous episode. The remaining 20 studies did not exclude or include subjects based on previous failures to any specific treatment.
Mental Health History
Table 6 shows the baseline severity reported for the different studies. As expected the baseline scores tended towards the latter quarter of the maximum instrument scores, which suggests that subjects had symptoms consistent with those with moderate to severe depression. Two studies did not provide baseline scores.80,112 The number of previous depressive episodes varied from a median of one episode (range zero to 8),112 median of two (range zero to 35);119 or median of seven to eight (range 12 to 15) in the STAR*D cohort.44,110,113 The reported mean number of episodes varied from one to two previous episodes,93,94 and three to six.95,105,123,124,126,131 Another study88 reported that 45 percent of the olanzapine group and 79 percent of the fluoxetine group of study subjects had three or more lifetime episodes. Twenty-one studies did not report the number of previous failed episodes.79,80,82–84,86,87,89,91,92,96–98,101,108,109,114,118,120,122,126–128,130,132–135,137
Two studies (three publications)79,109,126 reported the number of prior adequate antidepressant trials for the current episode and this varied from one adequate trial (67 percent) to three adequate trials (eight percent). Two studies95,124 showed some differences between treatment groups with respect to previous episodes, with the risperidone group having fewer previous failures. How previous episodes were defined and captured was not apparent in the majority of studies. No study in this grouping reported use of CAM at baseline or endpoint.
Intervention and Comparator
All but three studies44,110,113 employed an RCT design with at least some level of blinding. There were five studies94,110,127,128,133,135,137 that evaluated the use of nonpharmacological interventions including CBT,94,110 dialectical behavior therapy (DBT),128 interpersonal therapy (IPT)133,135,137 and exercise.127 The remaining studies used pharmacological agents combined predominately with augmenting agents and a new SSRI or other antidepressants.
Table 7 shows that approximately one quarter of the studies had prospective run-in phases and treatment phases that exceeded 8 weeks. Two of the retrospective failure studies provided treatment for this same interval. One study evaluated the Step 3 of the treatment algorithm after only 2 weeks of treatment switch.
Table 8 shows the combined interventions and all other treatment comparisons. Two studies included one treatment arm that evaluated an increased dose of sertraline119 or the addition of intravenous citalopram.82,83 Four studies had one treatment arm that evaluated a combination therapy that included the nonSSRI antidepressants clomipramine,82,83 bupropion,113 and desipramine.98,118
The majority of studies evaluated combination therapies that included augmenting agents (26 from 33 studies). From studies with at least one treatment arm using a combination therapy that included an augmenting agent, there were five drugs or classes of drugs for which there was more than one study, and these included atypical antipsychotics (respiradone, olanzapine, ariprazole, quetiapine), lithium, buspirone, mianserin, and pindolol. There were four studies98,112,118,124 with at least one treatment arm evaluating the effect of adding lithium; doses varied from 600mg/d,98,118 to 800mg/d,124 and one study did not report the dose.112 There were five studies evaluating atypical antipsychotics;88,95,108,122,134 the doses were similar for studies evaluating olanzapine at 5–6mg/d,88,122 but varied from 0.5mg/d95 to 1mg/d in studies assessing risperidone.108 There were three studies (seven publications)44,92,97,110,113,120,130 evaluating buspirone employing final doses that varied from 47mg/d97 to 60mg/d.113 Two studies evaluated the use of mianserin119,121 with doses of 30mg/d119 and 60mg/d.121 The augmenting agent pindolol was also evaluated in two studies; the dose was not reported in one study93 and was 7.5mg/d in the second study.96
The majority of studies reported change scores as the primary outcome of choice. All but two studies used the MADRS, HAMD, BDI, or QID-SD-16 for at least one primary outcome; other outcomes used included the CGI,92,97,114,120,130 and the WHOQOL Brief Psychiatric inventory.127 Only three studies explicitly stated that remission was the primary outcome, defined as a MADRS total score of less than 10,112 HAMD-17 score less than seven for 3 consecutive weeks,133,135,137 or the QIDS-SR-16 score less than five.44,110,113 All other studies either specified that the endpoint change score relative to baseline was the primary outcome, or did not report which measure was the primary one to evaluate efficacy.
The studies were conducted in Denmark and Iceland,119 Switzerland,112 France,121 Italy,82,83,127 Finland,120 Norway and Sweden,92,97,130 United Kingdom,94 Israel,89,123 Canada,86,88 and United States.44,79,80,84,86–88,91,93,95,96,98,101,105,108–110,113,114,118,122,124,126,128,131–135,137
Three studies did not report the setting.87,88,92,97,101,130,132 From the remaining 28, all studies included subjects from outpatient psychiatric, tertiary, or primary care settings with the exception of one study124 that included patients with a minimum of 4 weeks inpatient hospitalization.
Risk of Bias Assessment
Figure 6 shows that method of randomization, compliance with treatment, and the role of the funder were at high risk of bias for over 75 percent of the 28 studies evaluating monotherapies versus combination therapies. Allocation concealment was not achieved by approximately 30 percent of studies. Overall, these studies would be categorized as having a moderate risk of bias.
Adherence with treatment was evaluated in only three studies94,112,114 that reported some aspect of compliance with treatment; the remaining studies did not. A single study120 from 28 employed a washout phase (2 weeks) prior to switching to the new treatment.
Table 9 shows the distribution of studies with respect to the source of funding. Eighteen studies were funded solely by industry and 10 solely by nonindustry sources. One study84 was funded by both, and five studies did not report the source of funding. As indicated in Figure 6, the role of the study sponsor was not clearly specified in approximately 75 percent of the 28 studies evaluated here.
Efficacy of Monotherapy Versus Combined Therapy
Outcomes of Response and Remission
Figures 7 to 10 and Table 10 detail the rates of response and remission reported within the studies in this grouping. The rates of response and remission for all studies cannot be directly compared across studies, as different primary outcomes were used and there is some variation in thresholds for these outcomes. Response was defined as 50 percent change from baseline in all but one study that used the HAMD, BDI, or the MADRS; one study using the HAMD defined response as a 30 percent change.98 Three studies92,97,114,120,130 using the CGI defined response as a change to “improved” or “very improved.” Three studies did not specify definitions for response or remission.89,123,127 Thresholds for remission for studies using the HAMD varied from seven to eight and for the MADRS less than 8 or 10. Only one study provided some data for partial responders (greater than 25 percent but less than 50 percent).88
Although the majority of studies that could be examined in this systematic review involved comparisons between monotherapy against combination therapy, the wide array of agents used in the combination treatments make identification of trends difficult.
In general, these studies involved one of two study designs. The most commonly employed design involved establishing a cohort of patients who had an inadequate response to an SSRI and then randomizing that group to either maintenance of the SSRI and placebo treatment or maintenance of the SSRI in combination with an active intervention. The “monotherapy” group therefore reflects patients who received ongoing treatment with an SSRI that had been deemed to be ineffective or inadequate at a specified point in treatment. Far fewer studies employed a design in which patients who had an inadequate response to an SSRI were then switched to another treatment and then compared against the combination of the original SSRI plus a new intervention. The STAR*D trial exemplified this latter type of design in which a portion of patients were switched to a new antidepressant treatment following an inadequate response to citalopram, while another portion remained on citalopram and had a another treatment added (buspirone, bupropion, or CBT).
In the STAR*D trial, the data did not confirm the noninferiority or superiority of either a switch to monotherapy or the addition of another treatment (Figure 8). Although not statistically significant, there appeared to be a slight, but consistent, favoring of the combination treatment approach. In another trial with a small number of participants,82,83 adding either citalopram or clomipramine to another SSRI resulted in greater rates of response compared to adding a placebo. The additional treatments were all provided by intravenous infusion over five days, however, and extrapolation of these results is problematic as oral preparations of the same compounds might not have resulted in a similar pattern of results.
The greatest number of studies in this comparison group involved the treatment strategies of adding an intervention or placebo to ongoing therapy with the SSRI to which patients had shown an inadequate response (Figures 7 and 10). Note that studies within these figures have been categorized by drug classes (SSRI, nonSSRI, augmenting agents, nonpharmacological). Additionally, we have grouped the studies using augmentation agents based on the number of trials per drug or drug class; interventions that had more than one study included lithium, buspirone, mianserin, atypical antipsychotics, and nonpharmacological therapies. Although there were two studies where pindolol was used as the augmenting agent, one did not provide response or remission rates for the monotherapy group.96
For buspirone and the outcome of remission, we are limited to the different treatment arms of the STAR*D study (comparing sertraline relative to citalopram combined with buspirone), which showed a potentially small difference, but this was not for the outcome of response (see Figure 9). This may be an effect of the outcome used to define remission, as no advantage was seen for the QIDS-SR outcome. Studies evaluating the addition of mianserin show no relative advantage to the monotherapy comparator treatment for either the outcome of response or remission.
Overall, none of the augmenting agents showed any relative difference or advantage over the monotherapy comparator for the outcomes of benefit, with the exception of the atypical antipsychotics. Trials of fluoxetine in combination with olanzapine88,122 and risperidone108 in combination with SSRI treatment show some relative advantage over monotherapy in patients with MDD for both response and remission. Note that two studies95,108 provided subgroup data specific to the SSRI failed group. As such the studies were not randomized for this subgroup and therefore balance between groups was not maintained. Two studies (five publications)79,105,109,126,131 evaluated the benefits of adding aripiprazole in patients who had failed to respond to both SSRI and nonSSRI antidepressants. Although response and remission rates for the SSRI subgroup were not reported, a subgroup analysis (based on a pooled analysis79,126,131,136 indicated that patients on an SSRI combined with aripiprazole showed consistently greater MADRS total score relative to placebo (−8.6 versus −5.5 treatment difference −3.1, 95% CI: −4.5 to −1.7). Another two studies evaluated the use of quetiapine (at two different doses) as an augmenting agent and undertook a pooled analysis.134 This pooled analysis did not report response and remission rates but mean change scores for the SSRI subgroup (MADRS total scores at week 6 quetiapine XR 150 and 300mg/day, compared with placebo as adjunct to SSRIs [−14.8, −14.7 and −12.7, respectively; p<0.05, for each dose]).
Evaluation of CBT as an add-on therapy showed no advantage when considering any monotherapy comparator; however, most of these data were derived from the STAR*D study. Similarly, a study in older adults showed no advantage of adding on interpersonal therapy to escitalopram.133,135,137
Eight studies attempted to evaluate quality of life outcomes and some used more than one type of scale; all were selected as secondary outcome measures. Four studies used the Sheehan Disability Scale,88,95,105,108,109,131 four used the Endicott Enjoyment and Satisfaction Scale,44,91,95,108,110,113 and two also used some form of the SF36/SF12.44,88,110,113 The STAR*D also included a measure of work productivity.44,88,110,113 Note that four of these studies95,105,108,109,131 provided stratified findings for the SSRI failed subgroup; data for quality of life outcomes was not provided. All other studies showed no differences between groups for these outcomes.
Combined Therapy Versus Combined Therapy Interventions
There were six studies86,98,110,113,118,125,125 with treatment arms that compared combination therapies with each other. All studies were RCTs with the exception of one study which did not randomize subjects, and the STAR*D study.44,113 The STAR*D cohort110,113 for level 2 subjects evaluated three combined therapy interventions and only these arms (citalopram plus CBT with two combined drug therapy interventions) are compared in this section. Two studies86,125 compared different doses of the same combination drug therapies.
In total there were 832 participants in the treatment arms evaluating combined interventions and the sample sizes varied from 11125 to 650 participants.110,113 The sample sizes per treatment arm varied from 5 subjects125 to 286 subjects.113 One study44,110 exceeded a total sample size of 101 and two studies98,125 had less than 30 subjects.
Overview of Study PICOT Characteristics
The proportion of women in the sample varied from 47 percent,86,118 between 50 and 62 percent,98,113,125 and greater than 70 percent.82,83 Racial composition was not reported in four studies;82,83,98,118,125 two studies reporting ethnicity had approximately 78 percent110,113 and over 90 percent86 of the participants of the white race. Mean age of study subjects varied from 40 to 44 years in four studies,86,98,110,113,118 and ages ranged from 37 to 59 years,125 and 51 to 58 years82,83 in the remaining studies.
Table 11 shows the manner in which failure to respond to an SSRI had been established. Three studies82,83,98,125 determined failure retrospectively, and study subjects were currently on the same SSRI prior to the switch to the new intervention. In the three studies that determined inadequate response prospectively, fluoxetine,118 citalopram,110,113 and sertraline86 were the SSRIs for which failure was established. No study evaluated subjects specifically for prospective failed response to escitalopram, paroxetine, or fluvoxamine alone.
Mental Health History
Table 12 shows that five studies used the HAMD 17 or 21 item instruments to evaluate baseline severity; one study did not report baseline scores.125 It is notable that several studies44,110,113 included patients of mild to moderate severity based on the HAMD criteria, while others included patients with marked depression. The number of previous depressive episodes were reported as a median of seven to eight (range 12 to 15) in the STAR*D cohort110,113 and not reported in five studies.82,83,86,98,118,125
All but one study125 employed an RCT design and the STAR*D is considered a CCT. The STAR*D cohort99,100,102,110,111,113 for level 2 subjects, evaluated three combined therapy interventions and only these arms are compared in this section. Two studies125 compared two doses of the same combination therapy. Table 13 shows the duration of the study intervention. Two studies evaluated combined therapy for approximately one week,82,83,125 and the remaining studies varied treatment length from 4 to 12 weeks.
Table 14 shows the types of combination therapies evaluated in these six studies. Two studies included an arm evaluating the nonSSRI desipramine,98,118 and one each evaluating clomipramine82,83 and bupropion.113 The augmenting agents used in these studies included buspirone, lithium, and ziprasidone. Two studies86,125 compared different doses of the same combination studies involving sertraline with either lithium and ziprasidone. The doses for both lithium (400–800mg) and ziprasidone (60 mg and 80 mg) are in the low to moderate range. It is unlikely that lithium at 400mg/d would result in therapeutic blood levels, but low doses of lithium have been commonly employed in augmentation trials. The STAR*D cohort compared two drug combination therapies with citalopram or CBT.110,113
Risk of Bias
Figure 11 shows that studies evaluating combined therapies were at high risk of bias for randomization, reporting compliance, and balancing prognostic indicators. The role of the funder was clarified in all studies and funding for the studies came from nonindustry sources in three studies,98,113,118 industry in one study,86 and two did not reported the source.82,83,125 Overall these studies would be categorized as having a moderate level of risk of bias. None of the studies employed a washout phase or monitored compliance of subjects.
Efficacy of Combined Therapy Versus Combined Therapy
Response and Remission
Table 15 and Figures 12 and 13 report the rates of response and remission for studies evaluating combined treatments relative to other combined treatments. Figure 12 illustrates that when the combination of citalopram plus buspirone was compared against the combination of citalopram and CBT, there was a nonsignificant pattern favoring the combination of medications in the STAR*D trial. There appeared to be no differences between combinations of therapies in this large trial. When considering speed of response, there was a significant difference of 15 days for the group with CBT augmentation and only for the outcome of remission (p = 0.022).
The STAR*D study was the single trial to include quality of life measures and showed no significant differences between groups.
Interventions in Patients With Subsyndromal Depression or Dysthymia
Overview of Study PICOT Characteristics: Subsyndromal Depression
A single study129 evaluated patients described as those “with residual symptoms of a depressive disorder” and characterized by a score greater than seven but less than 10 on the HAMD-21 items. These subjects were classified as having subsyndromal depression following an acute episode. Seventy percent of the subjects were women and ethnicity was not reported. Mean age was 39 years.
There were no specific criteria reported for previous failure to paroxetine other than having residual symptoms and having been treated for 42 to 300 days.
Mental Health History
Failure of response to paroxetine was determined prospectively over a 4-week period. The subjects’ failure to respond to the current treatments were retrospective but the manner of determining this was not reported. Similarly, the history of any previous inadequate responses to treatment or length of the current episode was not reported.
Intervention and Comparators
In this study, subjects who had residual symptoms while on paroxetine were randomized to a continuation of paroxetine (20 to 40mg/d) or switched to mirtazapine (15 to 30mg/day) for an average of 36 days.
The primary outcomes in this study were rated on the HAMD-21. Changes in metabolic rate values and changes in the Arizona Sexual Experience Scale (ASEX) score showed no differences between groups.
This study was conducted in the Czech Republic and the setting from which patients were recruited was not reported.
Risk of Bias: Subsyndromal Depression
In this study, the type of randomization process and the degree of compliance was not clearly reported; all other categories were acceptable.
Efficacy of Treatment: Subsyndromal Depression
The findings of this study do not report differences between groups; rather, it is reported that 70 percent of subjects had a positive effect on residual symptoms but no mean change scores were given. Differences between groups were shown on the ASEX Scale in favor of mirtazapine starting from the first week of treatment (p = 0.004).
Overview of Study PICOT Characteristics: Dysthymia
One study117 evaluated subjects with dysthymia as diagnosed by the DSM-IV structured clinical interview and with a score of 12 or more on the HAMD-21 scale. Subjects with MDD or other types of depression (e.g., partial remission from depression) were excluded. Sixty-eight percent of the sample were women and the mean age was 42 years. Ethnicity was not reported.
Subjects were not excluded because of failures (other than the current response to paroxetine). The number of previous episodes of failure to treatment was not reported, but the mean duration of the depression was approximately 12 years with an onset at approximately 29 years of age.
Mental Health History
The subjects’ failure to the current treatment was retrospective but the manner of determining this was not reported. Similarly, the history of any previous inadequate responses to treatment or length of the current episode was not reported.
Intervention and Comparators
Subjects were randomized to either paroxetine (40mg/d) or paroxetine (20mg/d) and amisulpride (50mg/d).
The primary outcome for the study was response (defined as 50 percent change from baseline) for the HAMD (type not specified) and a score of one or two on the CGI-2. Remission was a secondary outcome and was not explicitly defined, but was assumed to be defined as a score on the HAMD.
The study was conducted in Italy and subjects were recruited from outpatient settings.
Risk of Bias: Dysthymia
This paper was at low risk of bias and there was only uncertainty around the role of the study sponsor.
Efficacy of Treatment in Dysthymia
Fifty-four percent of subjects on paroxetine alone and 56 percent in the combined group achieved response (50 percent change) on the HAMD-NS. Remission was defined as a score of seven or less and those achieving remission were 32 percent for paroxetine alone and 44 percent for the combination treatment group. Neither response nor remission was shown to be statistically different between the treatment groups.
Overview of Study PICOT Characteristics: Adolescents
There were three studies (13 publications)42,138–149 evaluating adolescents who had not responded to previous SSRI treatment, and from these one trial139,142,143 could not have data extracted. This study did have “Phase II” subjects (those who had an inadequate response) and two of the three study arms were eligible for this review (medication or CBT). Proportions of subjects who reached this stage were reported and contact with the authors confirmed that data are not currently available for Phase II subjects. The two other trials evaluated dose escalation of fluoxetine138 or switch to other antidepressants with and without the addition of CBT.42,140,141,144–149
Two studies evaluated children or adolescents with MDD. In the dose escalation study42,138 the eligibility criteria included children (aged 8 to 12 years) and adolescents (age 13 to 18 years). The mean age was 12 and 14 in the two groups respectively, but there were significantly more children (less than 13 years old) in the lower dose group. The majority (60 percent) were males and of Caucasian ethnicity (87 to 93 percent). In the Treatment for Resistant Depression in Adolescents (TORDIA) study,42,140,141,144–149 the majority of the sample (68 to 72 percent) were female adolescents from age 12 to 18; the average age was 16 years (SD 1.6) and predominately (>80 percent) of white race.
In the TORDIA trial, subjects who were currently taking an SSRI were established retrospectively. In this trial, subjects who had previously failed two or more adequate trials of an SSRI, who had a history of nonresponse to venlaxafine, or nonresponse to CBT (≤7 sessions), were excluded. Potential participants who were receiving CBT or were on other medications with psychoactive properties were also excluded. Inadequate response was defined as less than 30 percent change on the Children’s Depression Rating Scale–Revised (CDRS-R) for those who still had a score greater than or equal to 40 on this scale, and were in treatment on an SSRI for a minimum of 8 weeks. In the dose escalation study,138 inadequate response was similarly defined as a CDRS-R score with less than a 30 percent change after 8 weeks at the base dose of 20mg/d.
Mental Health History
The dose escalation study138 did not provide details of the previous mental health history; eligibility for this trial required moderate severity (CDRS-R score greater than 40) and a CGI of at least four. In the TORDIA trial,42,140,141,144–149 mean CDRS-R scores at baseline varied from 58 to 60 (19 to 22 on the Beck Depression Inventory– BDI) and CGI scores from 4.4 to 4.5. Approximately 74 percent of participants were in a first episode of depression; the mean duration of the current episode varied from 21 to 24 months. Approximately 25 percent of participants had a history of suicide attempts (varying from 21 to 27 percent). The level of comorbidity was significant in this group and approximated 36 percent for anxiety disorder and post-traumatic stress disorder (21 to 24 percent post-traumatic stress disorder alone), 14 to 18 percent for attention deficit hyperactivity disorder, and 27 to 32 percent for dysthymia. However, there were no differences in rates of comorbidity between the four treatment groups.
Intervention and Comparators
The initial dose of 20mg/d of fluoxetine was increased to 40mg/d in the dose-escalated group; this could be increased to 60mg/d after 4 weeks. The length of treatment was 10 weeks. In the TORDIA trial, study subjects were randomized to four treatment arms that included venlafaxine alone (up to 150mg/d), venlafaxine combined with CBT, citalopram, fluoxetine, or paroxetine (up to 40mg/d for all SSRIs) alone, or with CBT. CBT consisted of up to 12 (60 to 90 minute) sessions and one quarter to one half consisted of sessions with the family. The reported mean number of sessions was 8.3 across treatment groups. Subjects were tapered off the initial SSRI. All participants received family psychoeducation which consisted of providing information about depression, adverse events, and coping with mood disorders. The treatment interval was 12 weeks. After 12 weeks of treatment, responders could continue in their assigned treatment arm, and no-responders received open-label treatment for an additional 12 weeks (24 weeks total). Open treatment was not controlled and could result in a switch to a new antidepressant, dose increase (for those not at the maximum dose), augmentation, or the additon of CBT or other psychotherapy.
Both studies had two primary outcomes based on “adequate clinical response” defined as a score of two or less on the CGI Improvement subscale and a 50 percent improvement on the CDRS-R.
These studies were conducted in the United States and subjects were recruited from clinical sources and public advertisements (newspapers and radio) for both studies.
Risk of Bias in Studies With Adolescents
The dose escalation trial138 was generally well conducted, but the two treatment groups had some differences at baseline, even though the mean age was similar. Additionally, the primary author is employed by the study sponsor. The TORDIA trial had some potential threats to validity with regards to the method of allocation, concealment and blinding of the outcome assessor; there was low risk of bias in all other aspects of the study. A washout period for subjects on an SSRI other than fluoxetine was undertaken for 2 weeks prior to switching to the new intervention. The method of assessing compliance with the treatment was not reported, but the proportion of subjects who did not comply was reported. Overall, the TORDIA trial had a low risk of bias.
In the TORDIA trial, treatment fidelity for the CBT was well detailed and approximately 94 percent of reviewed tapes were found to be acceptable by on-site supervisors and by an external consultant.
Efficacy of Treatment in Adolescents
Response and Remission
In the dose escalation study,138 response was achieved by 5 of 15 and 10 of 14 subjects in the low- and high-dose groups respectively; the study was not powered to detect differences between groups although the investigators noted that there were no statistically significant differences between the groups. Similarly, there were no significant differences between groups when considering mean CGI improvement scores.
Table 16 details the study findings for the TORDIA trial findings at 12 weeks.144,145,147,148 There were no statistically significant differences between the medication alone groups. There was a statistically significant difference between the CBT groups in favor of including CBT for all outcomes. The main effect of CBT was consistent even after controlling for baseline severity factors (BDI scores and post-traumatic stress).
At the 24-week followup, adolescents within the TORDIA trial showed continued improvement.146,149 After 12 weeks of treatment, responders could continue in their assigned treatment arm, and nonresponders received open-label treatment for an additional 12 weeks (24 weeks total). Open treatment was not controlled and could result in a switch to a new antidepressant, dose increase (for those not at the maximum dose), augmentation, or the additon of CBT or other psychotherapy. From the original sample (n=334) only 78.1 percent (n=261) were assessed at 24 weeks. The findings at 24 weeks suggest that the likelihood of remission was higher and time to remission was faster for those who showed clinical response at 12 weeks, relative to those who did not show response by 12 weeks (61.6 percent versus 18.3 percent).146 Among all participants, failure to achieve remission at week 24 was associated with higher baseline depression, hopelessness, anxiety, and family conflict.146
Using the Children’s Global Adjustment Scale, functional status was assessed in the TORDIA trial and no main effects or interactions were shown. For responders at week 12, 19.6 percent relapsed at 24 weeks; predictors of relapse were similar to those for lack of eventual remission and included higher baseline depression (via interview and self-report), poorer functioning, and presence of dysthymia.146
A cost-effectiveness analysis was undertaken at 24 weeks within the TORDIA trial.149 The analysis would suggest that adolescents receiving CBT with medication achieved 8.3 more depression free days, and 11 more depression improved days across 24 weeks of treatment. However, combination therapy was significantly more expensive than medication switch alone.149
Strength of Evidence Ratings
Adults With MDD
We applied the criteria for grading the strength of evidence (SOE) to the studies and found that all studies directly evaluated the outcomes of remission and response and, as such, were not deficient in this domain. There was some variation in consistency of the effect depending on the treatment strategy. In general, most studies had relatively few participants, and when studies were considered as a group, there was difficulty in demonstrating a clinically useful conclusion. Studies were not designed to establish equivalence, noninferiority, or superiority. The majority of the studies showed no differences between treatment groups, suggesting uncertainty about these differences. Even when studies were sufficiently powered for the primary outcome, a statistically significant difference between groups was rarely found, making clinical interpretation difficult with respect to selection of an optimal strategy relative to the standard or usual treatment.
The outcomes of harms are detailed in KQ2. When we evaluated the SOE of the studies that reported the harms of suicidality, weight gain, and sexual dysfunction, all treatment strategies in KQ1 were consistently rated as insufficient. Overall, we found few studies that reported on the harms of interest. The inability to distinguish if the studies measured these harms, or simply did not report them (either because no events occurred or they occurred at the lowest frequencies), made rating SOE problematic. We considered the measurement of these critical and important harms to be necessary for all studies given the potential of these serious adverse events in MDD and with most treatment approaches.
There were several issues with regard to applicability of the eligible studies. Overall, the studies were comprised of adult subjects that were not representative of the broader population who experience MDD and who might experience a failed response to an SSRI. Subjects were predominately white women beween the ages of 40 to 50, and who had had more than one previous failure to treatment. For combined therapies, there was some concern about the dose and augmenting agent selection and the likely use of many of these in the context of primary care.
Monotherapy Versus Monotherapy in MDD
The grading of SOE for adults with MDD who have failed to respond to an SSRI is shown in Table 17. With respect to monotherapy compared with monotherapy interventions, we grouped all treatment approaches together, given the small number of studies and the various drugs and CBT. There were several important study limitations, in particular the lack of adequate randomization and the sample sizes of the studies. The confidence intervals were generally small and the effect sizes of similar magnitude were rated as consistent. All statistical testing undertaken in these studies showed no significant differences between groups, suggesting no advantage of any one monotherapy over another. None of the studies in this grouping explicitly stated that the trials were designed for establishing superiority. Overall, however, our rating of the SOE for all monotherapy strategies (dose escalation, switching to another antidepressant, or psychological intervention) was low. This suggests that future research would likely affect the estimates of effect sizes established in these studies.
Monotherapy Versus Combined Therapies in MDD
The SOE ratings for the studies comparing monotherapies to combined therapies is detailed in Tables 18 to 26. We considered these augmenting studies both as a single group (Table 18) and as subgroups related to the number of studies evaluating specific agents. There were four subgroups we considered with respect to specific classes of agents and these included: atypical antipyschotics, individual agents (e.g., buspirone, lithium, mianserin), and then all other agents were categorized as a single group for SOE rating.
When we considered all studies with augmenting agents (12 different types) as a single group, we rated the studies evaluating monotherapies relative to adding augmenting agents as insufficient SOE. The degree of similarity for the effect sizes was rated as inconsistent, despite the fact that almost all agents showed no relative difference relative to the monotherapy; the estimates tended to have wide confidence intervals and were not consistently overlapping. The large number of treatment agents, differing treatment intervals, population characteristics, and the wide range of sample sizes contributed to this grading of insufficient (Table 18).
When considering atypical antipsychotic medications alone, a SOE rating of low for the outcome of response and remission was given88,95,108,122 (Table 19). There was a consistent effect favoring combined treatment with atypical antipsychotics. One study with a small sample size showed very large confidence intervals122 for the outcome of response. Two studies95,108 showed larger confidence intervals for remission and this may be related to the “subgroup” data specific to the failed SSRI group that we requested from the study authors. The original study data included larger sample sizes as subjects with failed response to nonSSRI medications were included. For this reason, we rated these four studies as having consistency in showing the same direction of effect (favoring combined therapy), but as imprecise because of the nonoverlaping confidence intervals (as well as a small sample size in a single study and “some” studies).
The studies that used buspirone as the augmenting agent were separated into those that switched to a different antidepressant monotherapy (Table 20) versus those that added buspirone to the current SSRI (to which subjects had an inadequate response) (Table 21). The SOE was graded as insufficient for the latter category, as the studies were deemed to have a greater number of study limitations relative to the STAR*D trial44,113 that evaluated switching to new monotherapies. The STAR*D trial showed no difference when adding buspirone relative to the different monotherapies after switching to a new antidepressant.
The remaining groupings for augmenting agents for lithium (Table 22), mianserin (Table 23), and “other agents” combined (Table 24) were all graded as insufficient SOE due to the small sample sizes and significant study limitations. It is difficult to determine any level of confidence in the effects of these agents despite the fact that none were shown to be any different relative to the comparator monotherapy.
We grouped all studies that maintained the current SSRI and then compared this treatment arm with one where a different SSRI, nonSSRI, or nonpharmacological treatment was added (Table 25). This group of studies was rated as low for the outcome of response because of the differing agents and the small sample sizes. For the outcome of remission, a grading of insufficient was given, as the study limitations were significant. There were two studies that compared switching from the current SSRI to a new monotherapy treatment and then compared this with the new agent combined with any other drug. The studies that evaluated switching to a new antidepressant and then adding aripiprazole would have been included in this group, had we been able to acquire the rates of response and remission for the SSRI failed group. For the two studies that did provide these outcomes, one study112 had wide confidence intervals and effect size because of the small sample size; the other study was the STAR*D cohort and had multiple treatment arms and comparisons. The evidence is graded as low and the findings suggest no relative advantage to switching to a new drug or CBT relative to adding buspirone or bupropion (Table 26).
Combined Therapy Versus Combined Therapy in MDD
A rating of insufficient for the SOE was given to the studies that compared combined therapies relative to other combined therapies (Table 27). The STAR*D study was the single study in this group reporting the outcome of remission. The studies comparing combinations relative to other combinations were consistent in that the relative risks were generally of the same magnitude and the effect sizes showed that no one combined therapy was different than any other, given the study limitations.
Adults With Dysthymia and Subsyndromal Depression
The studies evaluating these populations were each limited to a single trial. One study with patients with subsyndromal depression129 had significant risk of bias and poor reporting; as such we rate this as insufficient SOE. The study on dysthymia117 had low risk of bias, but had a very small sample size, and the study subjects were predominately middle aged white females. For this reason we have judged this study as insufficient SOE.
From two studies reporting outcomes on children and adolescents, one was a pilot study evaluating dose escalation.138 The TORDIA trial42,141,144–149 evaluating efficacy of monotherapy relative to combined therapy was at low risk of bias and evaluated and reported harms well. Study findings showed no significant differences between groups. Although the intent of the trial was specified as establishing the superiority of the venlaxafine monotherapy arm, the margins of superiority and the statistical analysis for this were not reported. The SOE was judged as a low grade.
KQ2. What are the harms of each of the monotherapies or combined therapies among these adults and adolescents? How do the harms compare across different interventions?
Harms for interventions used in both adults and adolescents with MDD that had failed to respond to an SSRI were predominantly derived from RCTs that evaluated treatment strategies in this population; no observational studies were eligible. A clear trend for harms was difficult to specify across the differing interventions in adults. Harms were well evaluated in the one study of adolescents and a pilot dose escalation study.
Reporting and collecting of harms was problematic, particularly for predefining harms including serious and severe events, and reporting the total number of events per group in the studies with adults. The studies evaluating harms in adolescents provided high quality evidence for harms within this population when receiving pharmacological and psychological treatment.
Severe events and serious events (including suicidality) were inconsistently reported in studies with adult MDD populations.
A limited number of studies undertook statistical evaluation comparing harms between groups.
Harms in Adults With MDD, Dysthymia, and Subsyndromal Depression
From the 41 studies evaluating adults, all but one study included subjects with MDD; two studies evaluated subjects with subsyndromal depression129 and dysthymia.117 As noted previously, five studies107,150,178–180 and seven STAR*D publications181–187 did not have data that could be extracted. No observational studies with the required patient population and evaluation of harms was eligible for this CER. The summary of harms thus reflects those reported within the eligible studies.
We present the harms evidence for the eligible and extracted studies based on the type of treatment comparisons as follows: (1) monotherapy compared with monotherapy; (2) monotherapy compared with combined therapy; and, (3) combined therapy compared with combined therapy. Some studies evaluated more than two treatment arms, and are included in multiple sections, dependent on the drugs used.
Description of Studies Reporting Harms in Adults With MDD
Monotherapy Versus Monotherapy in Adult MDD
In the six studies having at least one monotherapy treatment arm, all but one study112 reported some aspect of safety and tolerability. None of the studies were specifically designed to compare the effect of harms between different monotherapies. One study115 included a proportion of subjects who had failed to respond to an SSRI; following email contact with the author, stratified information for outcomes of benefit (not harm) were provided.
The method of assessing adverse events differed greatly among studies, with a limited number of studies using standardized methods or scales. Figure 14 shows the ratings on the McHarm scale for evaluating risk of bias and reporting within comparative studies. Forty percent of the studies indicated that the harms reported were those that were observed in 2 or 3 percent,103,106 5 percent,85,119 or 10 percent of subjects;88,90 the remaining studies did not specify, or were unclear as to why the harms reported were included. None of the studies provided any a priori definitions of the harms, or of serious or severe events. Similarly, the mode of how harms were collected or the training of the person collecting them was not specified. Generally, the number of subjects who withdrew were specified per treatment arm; however, the number of specific adverse events per treatment arm were not well specified (50 percent).
Table 28 shows the rates of reported harms as a function of the treatment arm. Seven main categories of harms were selected to include within the summary table, but others were reported within the studies. The STAR*D cohort reported only the frequency of events as a range from one to 100 percent, not specifying the types of events as individual frequencies, and similarly identified numbers of serious events as having “at least” one event.110,113 One study explicitly identified that no serious events had occurred,88 and three studies (five publications) identified suicide events had explicitly not occurred;44,103,106,110,113 for the STAR*D, trials we assumed that serious psychiatric events encompassed suicidality. Rates of discontinuation due to adverse events were variable. In studies with open label prospective failure components, the number of patients who had adverse events and did not proceed to the next phase was not consistently reported. In studies with historical failure, the proportion of subjects who had experienced inadequacy due to intolerance because of harms was not detailed.
Two studies reported on both serious and suicide related events.44,103,106,110 Other adverse events not reported in Table 28 include dry mouth,88,103,106,119,121 dizziness,103,106,121 and fatigue.88,121 Increased appetite or weight gain was reported in two studies.88,122
Four studies88,103,106,119,122 evaluated statistical differences in rates of harms, however, two of these primarily evaluated the comparisons for the monotherapy group relative to the combined therapy group.88,119 Another study103,106 evaluated differences between two methods of switching from an SSRI to duloxetine; no statistical differences were found between the two methods.
Monotherapies Versus Combined Therapies in Adult MDD
Table 29 details the reporting of harms in studies comparing monotherapies to combined therapies. One study112 reported harms when evaluating monotherapies relative to combined therapies. Only one study114 was designed to assess the effect of therapies for both efficacy and harms in patients who had excess sleepiness and fatigue despite previous adequate SSRI treatment. The subjects in this trial were partial responders for the current episode. This study included specific measures of sleepiness and fatigue as part of the primary outcomes.
The method of assessing adverse events differed greatly among studies, with a limited number of studies using standardized methods or the use of scales to assess harms. Figure 15 shows the ratings on the McHarm scale for evaluating risk of bias and reporting within comparative studies. Eleven studies (40 percent) indicated that the harms reported were those that were observed in two to three percent,108 five percent,79,87,101,105,109,114,119,126,131,132 or 10 percent of subjects;86,88,124 however, three of these studies did not report harms specific to the SSRI subgroup.79,105,108,109,126,131 The remaining studies did not specify why the harms reported were included or were unclear (20 percent). All but one study80 provided a priori definitions for serious harms. Similarly, definitions for predefining the harms or how these would be classified as severe were not detailed in any study (Figure 16). The mode of collecting harms was unclear or not identified in all but three studies,91–93,97,130 which collected reports of harms, or their training was rarely specified. Generally, the number of subjects who withdrew were specified per treatment arm, and the total number of adverse events was generally reported.
Fifteen of 29 studies indicated that some type of statistical comparison between groups had been undertaken; however, only five studies44,87,93,101,122,124,132 specified the type of analyses and the remaining ones did not.79,84,88,95,97,105,108,109,114,119,120,126,131 One study88 showed that weight gain, dry mouth, somnolence, peripheral edema, and hypersomnia differed between the combined fluoxetine and olanzapine group relative to the fluoxetine group; rates were higher in the combined group. In this same study no differences in rates of adverse events were shown between the combined group relative to olanzapine monotherapy. Another study evaluating olanzapine showed differences relative to baseline but not between treatment groups.122
Another study119 evaluated differences between two monotherapy doses, or sertraline and sertraline combined with mianserin; statistical differences were shown only for the adverse event of sedation, with rates being higher in the combined therapy group. One study114 showed statistical differences in nausea and feeling jittery for the combined SSRI and modofinil group.
There were four studies79,95,105,108,109,126,131 that provided stratified outcomes of benefit for the SSRI subgroup alone. However, these studies did not provide stratified event rates for harms; as such, the rates of harms are not detailed as they reflect mixed antidepressant effect. For two studies79,105,109,131 the pooled analyses publication126 indicated that there were no differences between groups due to the antidepressant; this pooled analysis found that the combined therapy group with aripiprazole had approximately twice the incidence of adverse events (akathisia, restlessness, insomnia, fatigue, blurred vision, and constipation). The harms in another study95 were evaluated statistically and did not differ between antidepressants alone or combined with risperidone groups. Another study found rates of events to be similar between antidepressants versus antidepressants combined with risperidone, but differences were not evaluated statistically.
Other adverse events not reported in Table 30 include dry mouth,86–88,101,103,106,114,118,119,121,132 dizziness,79,86,96,105,109,114,121,126,131 and fatigue.79,82,83,88,105,109,121,126,131 Increased appetite was reported in two studies,87,88,101,132 and cardiovascular problems (hypotension, tachycardia, or bradycardia) were identified in five studies.80,82–84,96,114 For nonpharmacological therapies, most studies assumed that there were no adverse events to report with exercise,127 cognitive behavioral therapy,94 or dialectical behavior therapy.128
Combined Therapies Versus Combined Therapies in Adult MDD
From the six studies comparing combined therapies, none were designed to assess the effect of therapies on harms. The method of assessing adverse events differed greatly among studies with a limited number of studies using standardized methods or the use of scales to assess harms. Figure 16 shows the ratings on the McHarm scale for evaluating risk of bias specific to harms. A single study from the six specified that the harms reported represented those that were present in at least 10 percent of subjects.88 The remaining studies did not specify, or were unclear as to, why the harms reported were included (85 percent). No study predefined the harms, or the severe or serious harms. The mode of collecting harms, who collected the harms reports, or their training was generally not specified. Generally, the number of subjects who withdrew were specified per treatment arm, and the total number of adverse events were reported.
Table 30 shows the rates of reported harms as a function of the treatment arm. The STAR*D cohort reported only the frequency of events and did not specify the type of events or serious events. Two studies explicitly identified that serious events had occurred,44,110,113 or that suicide events had explicitly occurred. Rates of discontinuation due to adverse events were variable.
A single study113 reported evaluating statistical differences between groups. Other adverse events not reported in Table 30 include dry mouth,86,118 dizziness,86 and fatigue,82,83 and cardiovascular problems (hyper-and hypotension, tachycardia, or bradycardia) were identified in one study.82,83
Description of Harms in Studies With Dysthymia and Subsyndromal Depression
One study117 evaluated patients with dysthymia and found no differences between treatment groups (paroxetine vs. paroxetine + amisulpride). The presence of galactorrhoea and menstrual disorders were noted in 18 and 9 percent of female patients, respectively. These adverse events were not observed in the paroxetine alone group. Other harms reported included low rates of gastrointestinal problems, sexual dysfunction, dry mouth and headache, and some sexual dysfunction. Consistent with studies already described, this study did not predefine harms, serious or severe, and indicated that harms were assessed through “spontaneous” notification (passive methods). Nor was the training of the person collecting harms specified or the frequency and timing of collection. This study did account for all study withdrawals and adequately reported the total number of adverse events and as a function of groups for each type of harm.
The single study129 evaluating harms in patients with subsyndromal depression (following an acute episode) primarily assessed safety and not efficacy. In addition, the study evaluated the relationship between adverse events and the corresponding metabolic status of the isoenzyme CYP 2D6; the rationale for this is that paroxetine is a potent inhibitor of this enzyme which may lead to increased adverse reactions. Adverse effects were measured using the Udvalg for Kliniske Undersøgelser (UKU) Side Effect Rating Scale and the ASEX Scale. The study showed no statistical difference in the UKU scale, and the ASEX scale showed an improvement from the first week of treatment in the mirtazapine group. Two subjects from the mirtazapine group discontinued due to problems with insomnia; no dropouts were reported for the paroxetine group.
Description of Harms in Studies With Adolescents
The TORDIA trial found no statistical differences between treatments with regard to the frequency of events, any serious adverse events (including suicide related symptoms), or dropouts related to adverse events at 12 weeks.42,140,141,144–149 Sleeping difficulty was the only psychiatric adverse event that occurred in greater than 5 percent of the subjects. Some harms showed a tendency for increased rates with the use of venlaxafine and these included skin rash and cardiovascular events;42 self-injury was also higher in those with higher suicidal ideation.140 Further analysis of suicidal adverse events showed that predictors of suicidal adverse events were linked with poor response to treatment at 12 weeks.140 The harms in the TORDIA study were collected using standardized instruments (4-item Kiddie Schedule of Affective Disorders and the Side Effects form for Children and Adolescents) and collected in an active manner. Reports of serious effects or worsening symptoms were reviewed weekly with the investigative team. Once any concerns for safety were raised, participants were monitored weekly. All subjects completed the standardized safety scales at each pharmacological visit. The reporting of harms was clear, but severe harms were not defined a priori. Withdrawals were well described.
In the dose escalation study138 there were no statistically significant differences between the lower or higher dose groups with respect to solicited or unsolicited adverse events.
The dose escalation study138 used the Side-Effects Checklist and the reported harms were coded according to standardized terms. Description of serious events was not specified. Harms were assessed every two weeks.
KQ3. How do these therapies compare in different populations (e.g., different depressive diagnoses, disease severity, ages, gender, racial and socioeconomic group, and medical or psychiatric comorbidities)? These subgroups will be considered with respect to the different interventions
Overall, there is small number of studies that have evaluated the impact of disease type, disease severity, previous comorbidities, age, gender, and race on treatment outcomes.
There is some evidence from the STAR*D level 2 cohort that would suggest that persons with concurrent anxiety symptoms have less likelihood of achieving remission.
There is some evidence from the TORDIA trial that milder depression, less family conflict, and absence of suicidal behavior are associated with greater likelihood of a positive treatment response at 12 weeks in adolescents.
Given that there was one study each for adults with dysthymia and subsyndromal depression, this review is limited in its ability to meaningfully compare conclusions across populations with different depressive disorders. There are s7 studies (13 publications) that undertook stratified or subgroup analyses evaluating factors that may impact treatment outcomes in adults,81,88,92,97,99,102,114–116,118–120,130 and 1 (3 publications) for adolescents.42,140,141
Comparison Across Different Treatment Populations in Adults
Baseline Disease Severity
Six studies evaluated the impact of disease severity on treatment outcomes in adults. One study114 undertook a subgroup analysis on subjects with baseline HAMD-17 score greater than 17, and found that the group with combined treatment (SSRI + modafinil) had a statistically significant greater reduction (p = 0.05) relative to the SSRI group alone. Another study120 found that subjects with an initially higher MADRS score tended to show greater reductions in MADRS overall (p = 0.04), or within the first 2 weeks of treatment (MADRS (>30) in the combined therapy group (fluoxetine/citalopram + buspirone) relative to subjects in the SSRI group with higher initial scores). One study (3 publications)115,116,118 found that a lower baseline HAMD-17 score was predictive of response for the fluoxetine group (p = 0.008) and the lithium augmentation group (p = 0.04) but not the desipramine group; a reanalysis found that the odds ratio (OR) for augmentation strategy relative to a dose increase in fluoxetine (OR = 0.85 (95% CI 0.76 to 0.96).115 One of these studies found that the age of onset of depression was predictive of response (p = 0.009).115,116,118
Analysis of level 2 STAR*D cohort found that subjects with severe depression (QID-SR 16 or greater) were less likely to achieve remission (OR = 0.34 [95% CI 0.22 to 0.52]); however, this aspect was not valuable in assisting clinicians in recommending any monotherapy treatment (sertraline, venlaxafine, or bupropion).81 Greater baseline symptom severity was also associated with greater rates of attrition.102
Previous History of Failure
Two studies81,88 evaluated previous history of failure. One study undertook a subgroup analysis evaluating the drug class of previous failure (SSRI versus other); this study showed differences with the combined olanzapine-fluoxetine group achieving a statistically significant greater reduction on the MADRS relative to the fluoxetine or olanzapine monotherapy groups.88 This trend was observed in the nonSSRI group for those with at least one previous failure, but only for olanzapine and not fluoxetine.88
In the STAR*D level 2 cohort, intolerance to citalopram (OR = 1.57 (95% CI 1.11 to 2.21)) or response to citalopram during level 1 (OR = 2.78 [95% CI 1.77 to 4.38]) increased the likelihood of remission; however, this was not practically helpful to clinicians in selecting one monotherapy treatment over the other.81
The STAR*D cohort analysis for level 2 subjects on monotherapies (sertraline, venlaxafine, and bupropion), showed that remission was less likely in patients with other concurrent psychiatric disorders (specifically panic or post-traumatic disorders, generalized anxiety disorders, obsessive compulsive disorders, social phobia, or anxious or melancholic features).81 The overall OR for presence of anxious, atypical, or melancholic features were 0.30 (95% CI 0.20 to 0.45), 1.04 (95% CI 0.67 to 1.61), and 0.43 (95% CI 0.25 to 0.73), respectively.81
A more detailed analysis of the STAR*D level 2 cohort showed that the rates of remission were significantly less for anxious patients relative to nonanxious patients across all five pharmacological treatment arms (both monotherapy and combined therapy).99 Logistic regressions, however, indicated only a moderate effect of anxiety, suggesting that there was no advantage of one treatment over another in subjects with anxious depression.99
One study showed no significant difference on treatment response for subjects with melancholic features.119
Two studies showed no statistical difference when the impact of age on treatment response was evaluated.116,118,119 Analysis of the STAR*D level 2 cohort showed that an age younger than 35 increased the likelihood of remission (OR varying from 1.43 [95% CI 0.78 to 3.59[ to 1.81 [95% CI 0.97 to 3.38]).81 In contrast, younger age was associated with attrition for the augmentation treatment group.102
Three studies evaluated gender92,97,116,118,119,130 and showed no statistical difference on treatment response. The STAR*D cohort at level 2 estimated an OR of 0.96 (95% CI 0.69 to 1.35); overall, gender was not an important factor in helping to select the optimal monotherapy.81
Comparison Across Different Populations in Adolescents
Neither of the two studies evaluating children and adolescents assessed specific subgroups with respect to baseline severity, previous failures, age, and race as predictors of response. The TORDIA trial42,138,140,141,144–149 provided some evidence for other predictors of treatment response and showed that milder depression, less family conflict, and absence of suicidal behavior were associated with greater likelihood of a positive treatment response at 12 weeks. Conversely, a subgroup with substance abuse impairment was shown to be associated with greater depression severity at baseline, older age, family conflict, physical/sexual abuse, and comorbid oppositional defiant disorder.145 No relationship was observed between FKBP5 polymorphisms and suicidal events.144
In the context of combined treatment of CBT with antidepressants, adolescents with no history of abuse and few comorbidities had a greater probability of a positive response.141 Compared with adolescents with a history of physical or sexual abuse, those without had a threefold rate (OR = 2.8 95% CI 1.6 to 4.7) of positive response to combination therapy versus monotherapy.148 Those with a history of sexual abuse had similar response rates to either combination (with CBT) or medication therapy.148 In contrast, those with physical abuse had a lower rate of response to combination therapy relative to antidepressants alone; even after adjustment for other clinical predictors, adolescents with a history of physical abuse seemed to predict a poorer outcome for combination therapy.148 Older youths (age 18 to 19) (OR 3.7 [95% CI 1.2 to 12.0]) with more comorbidities are more likely to benefit from combined treatment.141 In the TORDIA trail, adolescents who had not responded by 6 weeks had their antidepressant dose increased. A dose increase at 6 weeks for those on citalopram or fluoxetine were most likely to result in a response when it led to a change in plasma concentration greater than or equal to the the geometric mean.147 This was not the case for paroxetine or venlafaxine.147
KQ4. What is the range of recommended clinical actions following the failure of one adequate course of SSRI based on current clinical practice guidelines published between 2004 and A?
There were 27 clinical practice guidelines (CPGs) (18 for adults, seven for adolescents, and two including both) providing recommendations for patients with MDD. Four CPGs for adults and three for adolescents did not provide any recommendations for patients with previous inadequate responses. Four guidelines included patients with dysthymia and subsyndromal depression but no recommendations for these subgroups who had failed previous treatment for both adults and adolescents. The majority of the CPGs did not specify a definition for inadequate response.
All CPGs for adults and adolescents were applicable to patients from primary care and outpatient settings; a smaller number indicated applicability to inpatient settings. For adults, the majority of CPGs did not specify any type of antidepressant when recommending switching to monotherapy strategies. Increasing the dose and duration was frequently recommended but the interval or change in dose was not specified in the majority of guidelines.
When combined therapy was recommended there was a greater tendency to specify the drug for adding to antidepressants. However, there was great variability in the augmenting agents recommended.
For adolescents, there was an approximately equal number of CPGs that specified which agents to consider for monotherapy and which to consider for combined therapies. There was a preference to commence treatment using nonpharmacological interventions. Some guidelines cited adult evidence as the evidentiary basis for suggesting treatment strategies.
Recognizing that clinicians have a number of treatment options to addressing patients with an inadequate response, we thought it would be important to evaluate current recommendations within CPGs regarding the optimal approach to treatment in patients with inadequate response. Our goal was to identify and critically appraise the “rigor” of these recommendations, and contrast and compare them for this failed response subgroup.
There were a total of 27 CPGs (33 publications) eligible for review.13,14,53–55,151–177,189 There were seven CPGs that were specific only to adolescents,13,14,172–176 18 CPG (24 publications) for adults alone,53–55,151–171 and two applicable to both adults and adolescents.177,189
Note that CPGs can be published as a comprehensive single document with numerous recommendations for different interventions, or as multiple documents related to different interventions but sponsored by the same organization and published in the same year. For the purposes of this review, we grouped publications based on unique content; any documents that summarized guidelines or specified recommendations for subgroups of patients included in the primary document were considered as companion publications to the main CPGs. There are six guidelines published by the National Institute of Clinical Excellence (NICE) for adults that are interrelated, and from these we evaluated only two publications as representative CPGs (guidelines 90164 and 97162). NICE guideline 90 is an update of the evidence and recommendations for subjects with MDD;164 summaries and quick references of these recommendations are published in guideline 23.168,169 NICE guideline 91171 is a summary of recommendations for depression in adults with chronic physical health problems and refers to recommendations in guidelines 90164 and 97.162 NICE guideline 97162 specifies recommendations for CBT. One companion paper summarizes recommendations for guidelines 90 and 91.170 Similarly, the American Psychological Association (APA) has updated their guidelines166 and the previous guideline167 was considered a companion.
There were six publications53,54,159,190–192 related to the Canadian Network for Mood and Anxiety Treatments (CANMAT) guidelines, three were recommendations53,54,159 and three publications54,191,192 provide supporting documentation for the methods used in the guidelines. One publication is a summary companion paper158 of another CPG from the American College of Physicians (ACP).55 In the guidelines for adolescents, two publications172,174 are from the Guidelines for Adolescent Depression in Primary Care (GLAD-PC) and two are from the United States Preventative Services Task Force (USPSTF).13,14
Figure 2 shows that 59 guidelines were excluded because of the following: 1) publication prior to 2004 (n=45); 2) exclusive focus on diagnosis or screening rather than treatment (n=7); 3) not a population of interest (n=4); and, 4) an algorithm (n=3).
CPGs Specific for MDD, Dysthymia, and Subsyndromal Depression in Adults
Characteristics of CPGs for Adults
Table 31 shows the characteristics of the CPGs as a function of country of origin, setting, and intended users. All 18 CPGs were applicable to adults with MDD. Four CPGs make note of dysthymia or subthreshold depression,55,164,177,189 but not all provide recommendations for those that had failed response to previous treatments. Similarly, an earlier version of the APA guideline167 discusses these subtypes but the most recent update focuses only on MDD.166
Dysthymia and Subsyndromal Depression
One CPG provided a general definition of dysthymia (not distinguishing this from minor depression) and recommended second- and third-line interventions following lack of sufficient response to a pharmacological agent or a psychological intervention.189 One CPG considered subthreshold persistent symptoms as a distinct subgroup of patients and noted the lack of clarity in studies that included subjects traditionally diagnosed with dysthymia (although the studies they evaluated used this classification, some studies do not distinguish this from minor depression).164 The lack of consistency in defining dysthymia and subthreshold depressive symtoms is noted, as is the potential lack of natural discontinuity between subthreshold depressive symptoms and MDD in the context of routine clinical practice. This CPG provided recommendations for those patients who had failed to respond to low-intensity psychosocial interventions or other interventions; it is not clear if other interventions includes failure to an SSRI.164 This CPG does not recommend the use of pharmacological treatment for subthreshold symptoms and as such would not make recommendations for dysthymia patients who have failed pharmacological treatment.
Three guidelines discuss dysthymia but do not provide any recommendations for treatment in those who fail to respond: one CPG specifies dysthymia as distinct from MDD (and can be present concurrently (double depression));167 however, in the update of this guideline there was no further discussion of dysthymia.166 Another CPG indicates that dysthymia is distinct from MDD, and is characterized by persistent symptoms for greater than 2 years and further includes this diagnostic category under the label of subthreshold depression (includes minor depression and other nonspecified categories).177 One CPG summarizes evidence on pharmacological treatment that includes both MDD and dysthymia but presents no recommendations specific to dysthymia.55
All of the CPGs for adults with MDD were applicable to patients from primary care and outpatient settings; six guidelines indicating applicability to inpatient settings (Table 31). All of the CPGs were intended primarily for, or were applicable to, primary care practitioners, with the exception of one CPG that was developed specifically for psychiatrists.156 The majority of guidelines were undertaken in the United States (n=6), one was developed by the Singapore Ministry of Health,161 one was an international consensus statement at a meeting sponsored by a drug manufacturer,163 one developed in Germany,165 and one by the World Federation of Biological Psychiatry.154
All but two of the 18 guidelines considered a variety of treatment interventions for adult MDD; these two CPGs evaluated solely pharmacological interventions,55 and computerized CBT.162 The other CPGs gave treatment recommendations that included a variety of pharmacological, psychological, and CAM interventions. However, the majority of recommendations were not applicable to patients who had had inadequate responses to previous pharmacological treatment. When recommendations were specific to patients who had previous inadequate response, none were distinguished by different classes of antidepressants.
From 18 CPGs, eight defined response as a 50 percent or greater reduction in symptoms (as measured on a standardized rating scale), and partial response as a 25 to 50 percent reduction in symptoms.53,54,151,154,159,166,177,189 One CPG specified that the measure should be a change in the Patient Health Questionnaire – 9.189 The CANMAT CPG recommendations53,54,159 were intermingled with order of treatment and lack of adequate response. First line treatment is identified as those interventions for which there is the best evidence of efficacy balanced with good safety and tolerability. Second- and third-line treatments are defined as those reserved for situations where first-line treatments are not indicated, cannot be used, or when first-line treatments are not effective. As such, for the CANMAT guidelines specific to CAM54 and psychological therapies159 the ‘failed to respond populations’ are not identified clearly within the body of the recommendations; we must assume that second- and third-line therapies are applicable to those that failed previous pharmacological treatments. One CPG notes the inconsistency in defining lack of response, but opts to categorize patients in the context of next step treatment options.164 Four CPGs did not include recommendations specific to failed response populations, and as such, a definition may not have been necessary.153,155,160,162 The remaining nine CPGs did not report a specific definition, and as such inadequate response and suggest variable operationalization of this for clinicians.55,152,154,156,157,165,169,177 One CPG emphasizes that response should be assessed with the use of a structured measure, but provides no recommendation as to the measure or threshold for definition.166
For those CPGs that did report a formal definition of inadequate response, only two provided clear indications for differential treatment strategies for those with partial versus nonresponse.151,166 Eight CPG indicated that the definition of inadequate response was linked to failure following time intervals varying from 2 to 4 weeks,154,161,165,177 4 to 6 weeks of significant improvement,189 4 to 8 weeks,166,167 and 6 to 8 weeks of partial improvement.55,161
Quality Assessment of CPGs for Adults
Table 32 shows the domain scores for the AGREE II ratings of the CPGs for adults. The AGREE II is based on six domains of methodology for the guideline process and one item with an overall assessment. All CPGs scored high for scope and purpose (Domain 1) (range 69 to 100 percent).
Stakeholder involvement (Domain 2) showed scores varying from 39 to 92 percent, and the lowest score was for a CPG sponsored by CANMAT53,54,159,190–192 making recommendations for the use of complementary and alternative treatments for MDD. Only six from 18 CPG indicated that patient’s views and preferences had been sought (score five or greater).151,154,156,162,164,165
For the domain of rigor of development (Domain 3), scores varied from zero to 85 percent; all but three CPG151,162,164 did not indicate a process for updating the guideline. For the domain of clarity of presentation (Domain 4), scores were generally high and varied from 61 to 94 percent. This domain evaluated whether the recommendations were clear and unambiguous, such that options were clearly presented, and key recommendations easily identifiable. However, the scores for the items within this domain were based on all recommendations within the CPG and were not specific to those applicable to patients who failed to respond to antidepressants.
When considering the applicability domain (Domain 5), scores were highly variable from zero to 89 percent. The majority of CPGs scored poorly for two criteria within this domain: 1) consideration of potential resource implications of applying their recommendations, and 2) presenting monitoring or auditing criteria. For the domain regarding editorial independence (Domain 6), scores were highly variable and ranged from four to 96 percent. In particular, potential competing interests of the guideline development group were not consistently recorded.
Note that although the AGREE II evaluates the methodology of the guideline process, it cannot evaluate the clincal merit (taking into account the methods for summarizing the evidence) and overall quality of the recommendations themselves. All of the CPGs had methods to establish the strength of the evidence, but they could not be compared with each other. Most systems of grading the strength of the evidence included aspects of study design, number of studies, or confidence of treatment; most included a level that reflected consensus or expert opinion for some recommendations.
Recommendations of CPGs for Adults
Four CPGs specific to MDD153,155,160,162 did not provide any recommendations for adult patients who had failed to respond to treatment. Two of these CPGs were specific to elderly patients in the community,153 and in long term care homes.160 One CPG had recommendations for patients with depression and cardiovascular disease,155 but none for those who had inadequate response to treatment. One CPG provided recommendations on the use of computerized CBT and was recommended for clients, which included only subjects with MDD or subthreshold symptoms that were not applicable to those who had failed previous treatment.162
Table 33 shows the recommended strategies for both monotherapy and combined therapies. Attempts were made to identify any recommendations with regard to specific medications that were highlighted; however, for some guidelines it was not clear if the text following the recommendation (e.g., “switch antidepressants”) was a selective summary of the available evidence or an actual recommendation for action. The CANMAT CPGs recommended a stepped approach to treatment, intending a particular sequence of interventions (for example, second- and third-line therapies); however, there were several options within each of these categories.53,54,159 Two other guidelines specified a stepped approach164 or second- and third- line agents,189 but were less explicit as to which agents to consider. Other CPGs did not explicitly indicate an order of treatment other than cautioning to optimize initial treatment. Similarly, two CPGs did not explicitly recommend a change in dose or duration.55,154 Two other CPG distinguished between partial versus nonresponse and specified different treatment approaches to these.159,166
CPGs Specific to MDD and Dysthymia in Adolescents
Characteristics of CPG for Adolescents
There were seven CPGs that were specific only to adolescents,13,14,172–176 and two applicable to both adults and adolescents.177,189 Table 34 shows the characteristics of the adolescent CPGs, as a function of country of origin, setting, and intended users.
All seven CPGs applicable to adolescents included those with MDD. Two CPG had some recommendations applicable to patients with dysthymia13,177 and one also specified treatment for subsyndromal depression13 in adolescents. However, none of the recommendations were specific to those who had failed previous treatment.
All CPGs for adolescents were applicable to patients from primary care and outpatient settings, two guidelines indicating applicability to inpatient settings13,177 (Table 34). All CPG were intended primarily for or applicable to primary care practitioners, and three to specialists13,175,177 and allied mental health workers.13
Only two CPGs provided definitions of inadequate response and this was characterized as failure of remission over a period of at least 2 weeks and less than 2 months, with no or very few depressive symptoms using a children’s global assessment scale/interviews13 or as failure to have a significant level of improvement from 4 to 6 weeks.189
Quality Assessment of CPGs for Adolescents
Table 35 shows the domain scores for the AGREE II ratings of the CPGs. One guideline rated poorly across three domains (Domains 3 to 5) (range 0 to 33 percent).175 All CPGs for adolescents scored high for scope and purpose” (Domain 1) (range 89 to 100 percent).
The remaining domains showed highly varying scores from four to 97 percent in the stakeholder involvement (Domain 2), and the views of the patients and public were sought in only two CPG173,176 (score six or greater). For the domain of rigour of development (Domain 3), scores varied from 21 to 92 percent; only one CPG176 indicated a process for updating the guideline. There was moderate variability observed in the clarity of presentation (Domain 4) (range 33 to 97 percent); this domain evaluated whether the recommendations were clearly presented and would suggest that most did this well.
When considering the applicability domain (Domain 5), scores varied from zero to 77 percent; the majority of CPGs scored poorly for two criteria within this domain: 1) consideration of potential resource implications of applying their recommendations; and, 2) presenting monitoring or auditing criteria. For the domain regarding editorial independence (Domain 6), scores were highly variable and ranged from 13 to 100 percent; in particular, the competing interests of the guideline development group were not consistently recorded.
As expected, the CPGs for adolescents had varying methods to establish the strength of the evidence and they could not be compared with each other. Similar to adult rating systems, most CPGs used grading systems that included aspects of study design (e.g., RCT), number of studies, or confidence of treatment; most included a level that reflected consensus or expert opinion for some recommendations.
Recommendations of CPG for Adolescents
Three of eight CPGs for adolescents did not provide any specific recommendations for adolescents who had failed to respond to previous treatment.14,172,175 One component of a CPG from the GLAD-PC focused only on identification and initial management.172 One CPG focused only on psychotherapy interventions and did not provide any recommendations specific to those who failed previous treatment.175 Another CPG from the USPSTF focused primarily on recommendations for screening and initial management.14 One guideline indicates that there is lack of evidence for the management of next steps of treatment for adolescents and provides no further indications.177
Two CPG provided recommendations following failure of psychological interventions. One CPG189 that evaluated treatment for MDD in both adult and adolescent populations, directed primary care practitioners to refer to secondary mental health services following lack of substantial improvement after six to eight weeks of supportive and psychological therapies Similarly, the recommendation was to seek adolescent psychiatric consultation if the use of an antidepressant was desired. Two CPGs13,176 provided recommendations for patients who had failed to respond to psychotherapy or had more complicated depressions; failure to respond to pharmacological treatment was not clarified for mild depression and recommended only for moderate to severe MDD.
Table 36 shows the proposed treatment options for adolescents with MDD. Three CPGs13,173,194 note the lack of evidence for adolescents, but cite adult evidence as the rationale for the treatment strategy of switching and augmentation in particular. One CPG makes clear recommendations to avoid the use of paroxetine and venlaxafine in adolescents 12 to 18 years of age.176
PICOT is an acronym encompassing the basic elements that must be considered in developing a research question: the patient population, intervention or interventions, comparators, outcomes, and timeframe under consideration.
Agency for Healthcare Research and Quality (US), Rockville (MD)
Santaguida PL, MacQueen G, Keshavarz H, et al. Treatment for Depression After Unsatisfactory Response to SSRIs [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Apr. (Comparative Effectiveness Reviews, No. 62.) Results.