PubMed Health. A service of the National Library of Medicine, National Institutes of Health.

Rubenstein LV, Williams JW Jr, Danz M, et al. Determining Key Features of Effective Depression Interventions [Internet]. Washington (DC): Department of Veterans Affairs (US); 2009 Mar.



Of 138 studies reviewed, 29 met all inclusion criteria (Fig. 1). One of these was excluded because the intervention targeted relapse prevention for patients in remission from a depressive episode. Twenty-eight studies were reviewed for further analysis. Reviewers also consulted fifty-six articles that were companion papers for included studies and described methods, long-term outcomes, subgroup analyses, cost data and other outcomes.

Fig. 1. Flow diagram of the search and selection processes for trials included in the systematic review.

Fig. 1

Flow diagram of the search and selection processes for trials included in the systematic review.

Among the 28 selected studies, overall study quality was high (Table 2 and Appendix A). Risk of bias was low in 18/28 studies (64%); moderate in 6/28(22%); and high in 4/28 (14%). Outcomes were assessed blind to treatment assignment in 23 studies, and intent-to-treat analyses were used in 21 studies. Fifteen of the 28 studies (54%) involved managed care practices, including five in Veterans Affairs facilities. Most were carried out in the United States.

Table 2. Summary of study design characteristics.

Table 2

Summary of study design characteristics.


Twenty-five of the 28 studies included usable continuous depression symptom measures for analysis, while three studies did not. Two of the excluded studies did not report a follow-up mean, and one did not report the sample size by group. Among the 25 studies in the continuous measure analysis, 21 included effect sizes based on short-term follow-up. Eighteen studies contributed intermediate-term effect sizes. Ten studies contributed long-term effect sizes. Eighteen studies reported a dichotomous improvement outcome and were included in the risk ratio analysis.

Four studies had two intervention arms and reported results for each arm separately. For Katon (1995) and Katon (1996) the major depression groups were used while the minor depression groups were excluded. Simon (2004) reported two intervention groups. Both had telephone care management. One group had the addition of telephone psychotherapy which was the intervention group we selected to compare against the usual care group. Simon (2000) had feedback only and care management – we selected the care management group to compare to the usual care group. For Wells (2000), the enhanced medication arm was used.

We found that not all studies could provide suitable information for each effectiveness analysis. We found that studies were too heterogeneous in their assessments of comorbidities and demographics to evaluate any of these variables quantitatively. We were, however, able to evaluate studies with a higher proportion (over 25%) minorities. 6 studies did not report minority status.


When we evaluated correlations between the intervention and evaluation features used in our final analyses, we found that enrolling patients through primary care clinician referral of patients or by identification of patients on antidepressants (versus by screening) is significantly correlated with use of patient level randomization (p = .03). Enrolling patients referred by primary care clinicians on antidepressants or willing to take them is correlated with study inclusion of fewer practice groups.(p = .01). Having mental health specialists involved in patient management was also correlated with inclusion of fewer practice groups (p = .03).

Looking just at collaborative care model intervention features, we found that studies in which medications were adjusted by primary care clinicians with expert guidance were significantly more likely to include care managers who were nurses, versus other types of professionals; were more likely to include at least 16 weeks of follow-up (p = .04); used more intensive (robust) interventions (p = .001); and followed an overall classic collaborative care intervention model (p = .04). Classic collaborative care interventions were correlated with having a dedicated care manager who used standardized depression symptom scales for follow-up (p = .03). Including active self-management was correlated with having a dedicated depression care manager who assessed depression symptoms with a structured instrument at baseline and at one or more follow-up visits, and who also assessed treatment adherence (p = .02).

No other correlations (associations) between study variables shown in Table 1 were significant at p <.05.


Overall, the experimental groups in selected studies showed improvement. The overall effect size by follow-up time was: short-term, −0.28 (−0.38, −0.19); intermediate term, −0.25 (−0.37, −0.14); and long-term, −0.19 (−0.36, −0.02). Twenty of 28 interventions improved depression outcomes over 3–12 months (an 18.4% median absolute increase in the proportion of patients with 50% improvement in symptoms; range, 8.3–46%).


Study designs varied on several major features. Subjects were recruited using four strategies: screening (n =11), clinician referral (n =9), administrative or pharmacy databases (n =3), a combination of these strategies (n =3) or direct contact by a pharmacist when an antidepressant prescription was filled (n =2). The unit of randomization varied across studies: 18 randomized patients, 3 randomized providers and 7 randomized practices.

Table 3 shows the relationship between four evaluation design features and depression symptoms and resolution, using univariate regression. In the table, the group with the higher negative effect size shows greater reduction in depression symptoms, and the group with the higher relative risk shows a better chance of depression resolution. Because nearly all studies had some effect, effect sizes both with and without the target feature are negative (showing an effect on depression symptoms), and nearly all relative risks are greater than one (showing greater resolution of depression). We report results on short-term, medium-term, and long-term outcomes. Our results show that studies in which patients were referred to care management through screening, rather than by primary care clinicians or administrative medication records, had a significantly higher relative risk of depression resolution. No other among the four design features we investigated was significantly linked to outcomes.

Table 3. Evaluation Design Features Versus Study Effects on Depression Symptoms or Depression Resolution.

Table 3

Evaluation Design Features Versus Study Effects on Depression Symptoms or Depression Resolution.

Table 3 also shows that studies with greater than 25% minorities tended to show greater effects, although this difference did not reach statistical significance. Six studies did not record minority status.


Table 4a shows the relationship between nine variables reflecting collaborative care model intervention features and depression symptoms and resolution. In the table, the group with the higher negative effect size shows greater reduction in depression symptoms, and the group with the higher relative risk shows a better chance of depression resolution. We report results on short-term, medium-term, and long-term outcomes. Initial univariate regression results evaluating the first five individual features listed in the table (structured care manager assessment, active self-management support, care manager triage to mental health, adjustment of antidepressants by primary care clinicians, and care managers who were predominantly nurses) showed active patient self-management support as the single statistically significant intervention characteristic associated with improved depression symptoms and depression resolution.

Table 4a. Collaborative Care Intervention Model Feature Effects on Depression Symptoms or Depression Resolution (Initial Variables)*.

Table 4a

Collaborative Care Intervention Model Feature Effects on Depression Symptoms or Depression Resolution (Initial Variables)*.

We then derived additional variables shown in Table 4b. This table shows that studies with nurse or PhD pharmacist care management, patient education, and at least sixteen weeks of care manager follow-up (classic collaborative care model with long-term follow-up, see variables Table 1) are associated with significantly improved long-term effects on depression symptoms. Having at least 16 weeks of care manager follow-up alone was not well-distributed as a variable, but appeared to be associated with a significantly greater reduction in depression symptoms at short-term and medium-term time points.

Table 4b. Collaborative Care Intervention Model Feature Effects on Depression Symptoms or Depression Resolution (Additional Variables).

Table 4b

Collaborative Care Intervention Model Feature Effects on Depression Symptoms or Depression Resolution (Additional Variables).


We designed our study impact measure to distinguish between high, medium or low impact by the study intervention by qualitatively taking into account all key outcome variables comparing intervention to usual care and all measurement time periods. We therefore expected it to be associated with, but not identical to, our quantitative study effect size measures of depression symptom reductions. Though the impact measure was based on expert rating, it was significantly associated with quantitative depression symptom effect size results at each time point (Table 5). Based on quantitative depression symptom reduction effects, the impact measure statistically distinguished studies designated as showing high, medium, or low impact from those designated as showing little or no impact. The impact measure was also associated with significantly greater relative risk of depression resolution. However, unlike the effect size measures we used for our primary study outcome tests, and as discussed in the Methods section above, the impact measure could be applied to all 28 studies. This afforded us additional opportunities to evaluate our results, and test their sensitivity to inclusion of all studies together.

Table 5. Relationship between Qualitative Impact Variable and Effects on Depression Symptoms and Resolution.

Table 5

Relationship between Qualitative Impact Variable and Effects on Depression Symptoms and Resolution.


To understand our initial regression results, we carried out extensive cross-case analyses looking at combinations of characteristics versus our impact measure, based on a priori study questions as indicated in Methods above. Table 6 shows cross-case analysis of our key collaborative care model intervention features. The cross-case analyses enabled us to look closely at features that occurred together in large proportions of higher impact studies (the 20 studies with high, medium or low impact versus the 8 with little or no impact). We considered features present in 80% or more of the higher impact studies to be core features of effective collaborative care models. Our analyses identified six collaborative care model intervention core features. These were:

Table 6. Qualitative Impact Analysis Table of Intervention Features.

Table 6

Qualitative Impact Analysis Table of Intervention Features.

  • ○ Primary care clinicians actively involved in patient management
  • ○ Mental health specialists actively involved in patient management
  • ○ Care managers assessed patient symptoms at baseline with a standardized scale
  • ○ Care managers assessed patient symptoms at follow-up with a standardized scale
  • ○ Care managers assessed treatment adherence at follow-up
  • ○ Collaborative care intervention included at least 16 weeks of active patient follow-up

As shown in Table 6 and, in greater detail, in Table 9, active patient self-management support strongly characterized the high and medium impact studies, and became less prevalent in the low and little or no impact groups. Only one high impact study and only two medium impact studies did not undertake active self-management support. The one high impact study without active patient self-management (Katzelnick, Simon et al. 2000) used a previously tested and validated educational tool.

Table 9. Understanding Self-Management Support.

Table 9

Understanding Self-Management Support.

As indicated in Table 6, there were five VA studies in the sample, and one additional study with a VA site. Among the studies conducted in Veterans Affairs facilities, those that supported active patient self-management (Oslin, Sayers, Ross, et al., 2003; Fortney, Pyne, Edlund, et al., 2006) had the higher impact scores. One study (Swindle) that had low impact used mental health clinical nurse specialists as care managers and did not include structured assessment. A second low impact VA study (Dobscha) did not include at least twelve weeks of follow-up.

Table 9 shows the details of the self-management support interventions used in these studies. While all included studies incorporated at least one chronic illness care element directed toward patients, this element might be, for example, care manager assessment without patient education and behavioral activation. We looked for self-management support approaches that featured behavioral activation or interactive problem-solving approaches, usually in addition to standard patient education using written material or videotapes. We counted CBT provided by mental health specialists as therapy rather than self-management support.

In addition to core features, we used qualitative analyses to identify features that varied across high impact studies, and could thus be considered options. The main feature that emerged as an option was telephone versus in person care management.


Table 8 shows qualitative results for evaluation design features. We found no evaluation design features other than those related to comorbidities that characterized more than 80% of high impact studies. In general, studies conducted in Veterans Affairs facilities were medium-sized, randomized at the provider or practice level, and eligible patients were referred through screening.

Table 8. Design Features Ranked by Overall Impact.

Table 8

Design Features Ranked by Overall Impact.

Table 7 shows results for comorbidities and demographics versus impact. More than 80% of studies excluded patients with bipolar disorder and psychosis. No study excluded anxiety. Six studies (21%) did not mention PTSD, and only 14% of the remaining 22 studies excluded patients based on it. Most studies (18 of the 28) excluded patients with substance abuse. Six studies (21%) did not report on the proportion of minorities enrolled.

In evaluating the relationship between evaluation design features and outcomes, we found no consistent effects qualitatively for the following design variables: number of practices in the study, patients per practice, whether patients were screened by the study and referred or were referred by their clinicians, or whether randomization was at the patient or cluster level.


The results of this review are limited by several important factors. First, the analyses conducted in this paper were not designed to address causality. Even the quantitative analyses must be considered descriptive, and the qualitative analyses are hypothesis-generating. The number of categories of analytic variables tested and the possibility of misclassification of variables across reviewers have the potential to bias our results. With only 28 articles, quantitative analyses are of necessity limited. Nevertheless, within the framework of a set of complex interventions that we already know have robust effects on depression outcomes, but that vary in basic components, our analyses have substantial strengths. Strengths of our analyses include our extensive querying of authors regarding intervention and evaluation features; our systematic approach to analysis; our rigorous variable definitions and validation through independent review and consensus; and our triangulation, or sensitivity testing, of conclusions across methodologies.

Our qualitative approach is hypothesis generating. Our impact score is subjective, although rigorously derived and associated with effect sizes at each time frame. However, the only quantitative approach to defining an outcome across all 28 studies would have been to combine effects across heterogeneous time frames and measurement methods, an approach that may have resulted in greater bias than the method we chose. Future studies could assess the tradeoffs between these approaches. Our analysis is based on iterative review of data by investigators, and thus subject to bias. Our use of rigorous cross-case methods mitigates, but does not eliminate this possibility. The transparent presentation of our analyses to readers in tabular form, however, should assist readers in making independent assessments of our conclusions.

The studies upon which we based our analyses also have limitations. The studies themselves have selection biases. Even studies that use screening to identify patients exclude some categories of patients and recruit only a portion of those eligible due to refusal. We therefore cannot be certain how well these studies generalize to use of the collaborative care model in usual practice settings. Most interventions in these studies were implemented through large health care organizations, limiting the generalizability of the results to organizations with sufficient structure, commitment and resources to implement interventions requiring changes in systems of care. Moreover, practices in fee-for-service environments that do not reimburse for care management services have fewer incentives for implementing these interventions. In addition, there may be contextual variables not measured in the studies that influence outcomes. For example, even within managed care organizations, Rubenstein et al. (Rubenstein, Parker et al. 2002; Rubenstein, Meredith et al. 2006) found that expert leadership and support from local practice management and mental health specialists influenced the development of successful programs. In addition, study comparisons were to usual care. Usual care, however, is heterogeneous across settings and providers, For example, some usual care settings may have ample mental health specialty access, while some have little.

In terms of the interventions studied, we could not test the independence of many of the features in association with outcomes because the features were not distributed evenly among studies. Some features also tended to occur with other features. We extensively tested combinations of features both in quantitative analysis (e.g., the classic collaborative care model) and in qualitative analysis, where features could be viewed as present or absent across studies.

Our study is limited in terms of shedding light on the chronic illness care model as applied to conditions other than depression. Depression has unique features that might make it more necessary to expend resources on, for example, care management. Depressed patients tend to be apathetic, poor consumers who benefit from proactive care. They are often not detected without screening. Screening requires psychological testing, rather than a blood test, and follow-up of positive screens requires either a full mental health specialist interview or psychological assessments using standardized tests that take in the neighborhood of 40 minutes to complete. Assessment is difficult to complete in an average 20 minute primary care visit. Assessment can additionally uncover serious urgent or emergent conditions such as suicidal threats, requiring additional time to handle appropriately. Furthermore, adherence to treatment is likely to be a more prevalent problem among depressed patients than among other chronically ill patients. Follow-up care requires frequent contact, but not full in-person visits the interview, not the physical examination or laboratory testing, provides most of the necessary information. Finally, mental health specialists may be more organizationally or geographically separate from primary care than are other specialists. The unique characteristics of depression may thus help explain why there is substantially more evidence for cost-effectiveness of depression collaborative care than for other care management-based interventions. Future research will clarify the transportability of conclusions on depression collaborative care to other chronic conditions.

The studies on collaborative care for depression were also severely limited in terms of addressing medical or psychiatric comorbidities. No evidence is available from these studies on collaborative care for bipolar disorder or psychosis, because these patients were nearly universally excluded. Patients with subthreshold depression were also usually excluded, although one of the studies later found evidence of positive effects among these patients (Wells 2005). Only a few studies included patients with substance abuse. Among the studies that included a broader group of patients, intervention protocols most likely specified referring patients with severe psychiatric comorbidities for mental health specialty care (e.g., RAND 2000). Thus, no conclusions can be drawn from this review on collaborative care for primary care patients with medical or psychiatric comorbidities.

This study identified only 28 studies out of 1464 that met full inclusion criteria, and less than half of these were published in the last five years. Previous collaborative care literature syntheses have netted 30 to 40 includes, but have included more heterogeneous groups of interventions and evaluations. New methods for achieving the necessary improvements in depression care may well be identified in future reviews.


While collaborative care models for depression vary, careful analysis of model features shows that a core set of characteristics is linked to better results. This set is robust across qualitative and quantitative analyses, and does not seem to be biased by links to particular evaluation design features (e.g., the design feature of randomization at the patient level is distributed across all levels of care model impact). In addition to the core variables, active patient self management support appears to characterize the set of very high impact studies. These finding are sufficiently strong to support recommendations to sites intending to implement collaborative care for depression.

Guidelines for sites intending to implement collaborative care for depression should identify primary care and mental health specialty clinician involvement; care manager assessment of symptoms at baseline and follow-up using a structured instrument; care manager follow-up assessment of treatment adherence; and active follow-up for at least 16 weeks as core features of current evidence-based models. Guidelines should further recommend inclusion of active self-management support, such as elements of patient activation, cognitive behavioral or problem-solving therapy, or motivational techniques, for additional improvement in outcomes.