U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

O’Connor E, Henninger M, Perdue LA, et al. Screening for Depression, Anxiety, and Suicide Risk in Adults: A Systematic Evidence Review for the U.S. Preventive Services Task Force [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2023 Jun. (Evidence Synthesis, No. 223.)

Cover of Screening for Depression, Anxiety, and Suicide Risk in Adults: A Systematic Evidence Review for the U.S. Preventive Services Task Force

Screening for Depression, Anxiety, and Suicide Risk in Adults: A Systematic Evidence Review for the U.S. Preventive Services Task Force [Internet].

Show details

Chapter 3Results

Overview of Included Studies

The results for this review will be presented by condition: depression, anxiety, and suicide risk. Within each condition, results are organized by KQ.

We reviewed 23,497 abstracts and assessed 1237 full-text articles for inclusion (Appendix B Figure 1). Overall, we included 185 original research studies or ESRs (reported in 231 publications) across conditions and KQs. This includes 99 primary studies and 86 existing systematic reviews, which collectively include approximately 5000 studies and 10.6 million participants (Figure 2). For depression we included a total of 105 studies [32 original studies (n=385,607) and 73 ESRs (including approximately 2,138 studies and an estimated 9.8 million participants), including the following: KQ1 included 17 RCTs, KQ2 included 14 primary studies and 10 ESRs; KQ3 included 1 RCT, KQ4 included 39 ESRs, and KQ5 included 27 ESRs (reported in 34 publications) and 1 cohort study. For anxiety, we included 59 studies [40 original studies (n=275,489) and 19 ESRs (including approximately 483 studies and an estimated 81,507 participants)] including the following: KQ1 included 2 RCTs, KQ2 included 10 studies, KQ3 had no included studies, KQ4 included 26 primary studies and 18 ESRs, and KQ5 included 3 RCTs, 8 ESRs, and 2 case-control studies. For suicide risk, we included 27 original studies (n=24,826), including the following: For KQ1, we included 1 RCT, KQ2 included 3 primary studies, KQ3 included 1 RCT, KQ4 included 23 RCTs, KQ5 had no additional included studies. The full lists of included studies (by condition) and excluded studies (with reasons for exclusion) are available in Appendix C and Appendix D, respectively.

This figure shows an overview of the number of included studies by condition, key question, study design, and intervention type.

Figure 2

Overview of Included Studies. NOTE: Studies may be counted under multiple Key Questions and/or conditions. Abbreviations: ESR = existing systematic review; no. = number.

Depression

KQ1. Do Depression Screening Programs in Primary Care or Comparable Settings Result in Improved Health Outcomes in Adults, Including Pregnant and Postpartum Persons?

KQ1a. Does Sending Depression Screening Test Results to Providers (With or Without Additional Care Management Supports) Result in Improved Health Outcomes?

Summary

Seventeen trials (reported in 28 publications) examined depression screening,149164 including one that examined screening for depression and several other conditions165 (Table 8). The included trials covered general adult,149153, 165 older adult,154157 and perinatal populations.158164 Evidence supported the benefits of screening for depression (Table 9). For example, screening interventions, most of which also included other care management components, were associated with a lower prevalence of depression or clinically important depressive symptomatology (OR, 0.60 [95% CI, 0.50 to 0.73]; 8 RCTs [n=10,244]; I2=0%), and, among participants above a specified symptom level at baseline, a greater likelihood of remission or falling below a specified level of depression symptomatology (OR, 1.58 [95% CI, 1.23 to 2.02]; 8 RCTs [n=2,302]; I2=0%) at 6 months post-baseline or postpartum (or the closest followup to 6 months).

Table 8. Characteristics of Depression Screening Studies (KQ1).

Table 8

Characteristics of Depression Screening Studies (KQ1).

Table 9. Summary of Meta-Analysis Results for Depression Outcomes Among Depression Screening Studies (KQ1).

Table 9

Summary of Meta-Analysis Results for Depression Outcomes Among Depression Screening Studies (KQ1).

Study Characteristics

Seventeen studies (n=18,437) examined the benefits of screening for depression,149165 including one that also screened for symptoms of anxiety, sleep problems, pain, or fatigue and enrolled patients endorsing any of these concerns (Table 8). Six of the included studies covered general adult populations,149153, 165 four were limited to older adults,154157 six were limited to postpartum patients (generally between 2 and 12 weeks postpartum),158162, 164 and one was limited to pregnant patients.163 Only four153, 159, 160, 162 of the included studies had a control group that was not screened for depression and are considered KQ1 studies (Figure 3). The remaining studies screened all participants but only gave the screening results to clinicians of intervention group participants, meeting criteria for KQ1 and KQ1a are combined and not discussed separately. Nine149152, 154157, 165 of the included studies only enrolled patients who screened positive for depression. The remaining eight studies included all patients, regardless of the depression screening results,153, 158164 including all of the studies conducted in perinatal populations. All of the studies used some type of individual outreach from a predefined pool of potentially eligible persons for study recruitment, typically patients who were visiting or were registered with participating clinicians or clinics; none relied on interested individuals to contact the study in order to join the study. All but two162, 165 of the included studies were also included in the previous USPSTF review on screening for depression.1

This figure shows the key study design features among depression screening studies for KQ1 by control group type.

Figure 3

Key Study Design Features Among Depression Screening Studies (KQ1). Abbreviations: CG = control group; KQ = key question.

Nine of the studies were conducted in the US,149153, 155, 157, 164 and the remaining were conducted in the UK (among postpartum patients),160, 161 Hong Kong (among postpartum patients),159 or Northern European countries (covering older adult,154, 156 postpartum,158, 162 and pregnant163 patients). Only one of the studies conducted in the US was focused on a perinatal population, conducted among postpartum patients164; the remaining US-based studies covered general149153, 165 and older155, 157 adult populations. All studies took place in primary care, general practice, OB-GYN, or other maternal/child wellness contexts.

Information about the included samples is summarized in Table 10 (see Appendix E Table 1 for details by study). Across all 17 studies, the average age of participants was 38.2 and this varied by target population. Ninety-three percent of all participants were women; and a majority were women even among studies focused on general adult populations (73% women) and older adults (66% women). Among the nine studies conducted in the US, the percent of participants who were Black ranged from 7.1 to 51.2 (among the six studies reporting), the percent who were Hispanic/Latino ranged from 4.5 to 59.3 (among four reporting), and the percent who were White ranged from 29 to 94.1 (among the six reporting). Only one study reported the percent of participants of Asian descent, and none reported the percent who were Native American or Alaska Native. Three studies had a relatively high proportion of Black participants, with 49.3 percent165 (among a general adult population), 51.2 percent155 (older adults), and 32.6 percent157 (older adults). One study153 had a relatively high proportion of Hispanic/Latino participants (59.3%, general adult population). One study focused on primary care patients in rural clinics149 and three had samples who were largely economically disadvantaged, as evidenced by being on Medicaid or uninsured and below the poverty line,150 being medically indigent155 or having low annual income levels (e.g., 76% earning less than $17,000 in the late 1990s).153

Table 10. Summary of Participant Demographic Characteristics Among Studies of Depression Screening (KQ1): Weighted Mean (Number of Studies Reporting), Unless Otherwise Indicated.

Table 10

Summary of Participant Demographic Characteristics Among Studies of Depression Screening (KQ1): Weighted Mean (Number of Studies Reporting), Unless Otherwise Indicated.

The included interventions were very heterogeneous (Table 11, Appendix E Table 2). Four trials studied the effects of screening (or receipt of screening results) with little or no further training or intervention components, conducted in general,153, 149, 165 and postpartum159 populations.163 In these studies, primary care clinicians typically confirmed the diagnosis and made decisions about the need for treatment according to their usual approach. Additional components beyond screening variously offered in other studies included training and materials to improve clinicians’ knowledge and skills surrounding diagnosis and treatment of depression, facilitation or improvement of the referral process, and patient-specific treatment recommendations based on screening results. Four studies offered one-on-one psychological counseling, medication adherence counseling, or symptom monitoring sessions by specially trained staff.151, 152, 164, 156 Three of these included regular monitoring both of symptoms and medication use as well counseling sessions.151, 152, 164

Table 11. Summary of Intervention Components in Addition to Screening in Depression Screening Studies (KQ1).

Table 11

Summary of Intervention Components in Addition to Screening in Depression Screening Studies (KQ1).

Four studies were rated as good quality151, 156, 165 and the remaining were rated as fair quality. The most common issue that warranted a “fair” rating was attrition higher than ten percent. Some fair-quality studies had few other concerns besides attrition (i.e., all or most of the following: adequate randomization methods, baseline comparability between groups, blinding of outcomes assessment, conservative handling of missing data, acceptable statistical methods, and no apparent selective reporting of outcomes).152, 157, 161 Other common issues among fair-quality studies were lack of information about whether allocation was blinded and small sample sizes leading to uncertainty about baseline comparability between groups. One of the studies used a quasi-experimental design which assigned two comparable municipalities in Norway to be intervention and control areas,158 but the remaining studies were either individual or cluster-randomized trials.

Results

Detailed results for all outcomes are reported in Appendix E Tables.

Depression Outcomes

Sixteen of the seventeen studies reported the percent of patients who (a) met criteria for a depression diagnosis or were above a specified symptom score at followup (“prevalence,” Appendix E Table 3), (b) did not meet criteria for a depressive disorder or were below a specified symptom score at followup (“remission,” Appendix E Table 4), or (c) showed a prespecified level of symptom reduction, such as a certain number of points or a percentage decline relative to their baseline score (“response,” Appendix E Table 5). Pooled results for the first two of these are shown in Table 9. Pooled analyses showed that screening programs were associated with a lower prevalence of depression compared with no screening or no screening results being given to participants’ clinicians (OR, 0.60 [95% CI, 0.50 to 0.73]; 8 RCTs [n=10,244]; I2=0%), and, among participants above a specified symptom level at baseline, a greater likelihood of remission (OR, 1.58 [95% CI, 1.23 to 2.02]; 8 RCTs [n=2,302]; I2=0%) at 6 months post-baseline or 6 months postpartum (or the closest followup timepoint to 6 months). Absolute prevalence and remission rates were highly variable, presumably reflecting differences in how the outcome was measured and differences among the study samples. At followup, depression prevalence ranged from 2.5 percent to 67 percent in the control groups and from 0.6 percent to 62 percent in the intervention groups; the median (interquartile range) absolute difference in percentage points between groups was −5.2 (-6.8 to −2.0), favoring the screening groups. Depression remission ranged from 11.7 percent to 66 percent in the control groups and from 13.2 percent to 78.1 percent in the screening groups; the median (interquartile range) absolute difference in percentage points between groups was 7.2 (2.9 to 15.2), favoring the screening groups.

We also conducted a combined analysis, in which remission was entered if it was reported, prevalence (reversed) if remission was not reported, and the percent of participants meeting criteria for a “response” to treatment (typically 50% reduction in symptoms) if neither remission nor prevalence were reported (Figure 4). The combined analysis also demonstrated that the screening programs were associated with a 63 percent increase in the odds of improved depression (OR, 1.63 [95% CI, 1.37 to 1.95]; 16 RCTs [n=8,448]; I2=0%). The most robust evidence is among general adult and postpartum populations. Only one trial was limited to pregnant persons, but those findings were consistent with the findings among general and postpartum populations. Among studies of general, postpartum, and pregnant patients, effect sizes were consistently in the direction of benefit, and many were statistically significant for at least one timepoint, particularly among perinatal women. The results in four trials limited to older adults (with lower age cutoffs ranging from 55 to 75 years) were inconsistent with point estimates on both sides of 1.0 and there were no studies reporting statistically significant differences between groups. Stratified analyses indicated statistically different pooled effects across populations and, in a separate analysis, that effect sizes were larger among trials that were not limited to people with symptoms of depression. These findings are discussed further below under “Effect modification and Findings in Specific Populations.”

This figure is a forest plot showing a combined outcome representing reduced depression from depression screening studies (key question 1): depression remission or scoring below a cut-off, depression prevalence or scoring above a cut-off (reversed), and depression response.

Figure 4

Forest Plot Showing a Combined Outcome Representing Reduced Depression From Depression Screening Studies (KQ1): Depression Remission or Scoring Below a Cutoff, Depression Prevalence or Scoring Above a Cutoff (Reversed), and Depression Response. (r)Reversal (more...)

Thirteen studies also reported a continuous measure of the level of depression symptoms (Figure 5, Appendix E Table 6).149151, 154161, 163, 165 All of the studies in perinatal patients reported greater reductions in depression symptoms in screening groups than the control groups at one or more time points.158161, 163 Differences between group were typically 1 to 3 points on the Edinburgh Postnatal Depression Scale (EPDS) at 6 to 26 weeks postpartum, and findings were statistically significant for one or more time points in all studies of perinatal women. Only one of the eight studies in non-perinatal populations found a statistically significantly greater reduction in depression symptoms,151 although differences trended in the direction of a small benefit in most of the other studies. Several studies did not provide sufficiently detailed results for pooled analysis.

This figure is a forest plot showing the difference between groups in change from baseline depression symptom score in depression screening studies (key question 1)

Figure 5

Forest Plot Showing the Difference Between Groups in Change From Baseline Depression Symptom Score in Depression Screening Studies (KQ1). Abbreviations: BDI = Beck Depression Inventory; CES-D = Center for Epidemiologic Studies Depression scale; CG = control (more...)

Other Mental Health Outcomes, Quality of Life, and Functioning

Some studies reported on anxiety (Appendix E Table 7),161, 162, 165 broad mental health symptom levels (Appendix E Table 7),159, 161 or quality of life (Appendix E Table 8).150152, 154, 156, 160162 Consistent with the findings on depression symptoms, the studies limited to postpartum women typically found small statistically significant benefits of the screening program, but the studies in general and older adults did not. One exception, however was that two studies in general adults with extensive screening supports both found improvements in mental health-related quality of life, as measure by the SF-36 mental health component scores.151, 152 Two studies in older adults reported very similar effects on functioning in their screening and control groups (Appendix E Table 9).154, 155

Other Health Outcomes

One study of older adults reported all-cause mortality (Appendix E Table 10).156 This study found fewer deaths in the screening group (5.8%) than in the control group (14.4%, OR, 0.36 [95% CI, 0.15–0.92]), however this was a small study with only 239 participants and 24 deaths. One study in postpartum women found no differences in the rate of hospitalization of their children or the child’s body weight through age 18 months (Appendix E Table 11).159

Effect Modification and Findings in Specific Populations

No studies reported subgroup analyses exploring results by gender. Only one study each reported findings by age group (in a study limited to adults age 75 years and older156) and race/ethnicity.152 No differential impact was identified for any outcome in either of these studies. Among studies that were limited to specific populations, stratified analyses of the combined depression outcome (i.e., including remission/below a cutoff, response, or prevalence/above a cutoff [reversed]) indicated statistically significant differences among the populations tested, with larger effects in studies limited to pregnant or postpartum patients (p=0.005), and smaller effects in studies limited to older adults (p=0.007). However, study design differed across populations, as well as other features, making it impossible to determine whether the population or the other study features drove the association with effect size. For example, studies in perinatal women were also more likely to include unscreened control groups and not to restrict their samples to patients with depressive symptoms, a factor that was also associated with larger effect sizes in stratified analyses (p=0.01). In addition, the relatively small number of included studies warrants caution in interpreting meta-analytic differences by study characteristics.

Effects in Older Adults

The trials among general adult populations included older adults but none of them reported subgroup effects by age. However, one of the trials in general adults had an average age of 58, indicating that a substantial minority were at least age 60 and older.153 In this study, intervention group patients who were depressed at baseline were more likely to be in complete remission at followup than unscreened depressed patients. Specifically, 48 percent of screened participants had ≤1 symptom of depression compared to 27 percent of those not screened (p<0.05). Among the trials limited to older adults, only one used a measure of depression symptoms that was specifically designed for older adults.157 This may be an important limitation because older adults commonly suffer from loss of energy, sleep disturbance, and other somatic symptoms of depression that are due to aging or medical conditions, so general symptom severity instruments may be less sensitive to treatment response. Additionally, none of the trials in older adults offered individual psychological counseling by someone with training in psychological treatment in older adults, and the participation in psychoeducational groups offered in two studies was less than 20 percent in both cases.156, 157 Thus, interventions in the studies of older adults fell almost entirely to the primary care provider.

KQ2. Do Instruments to Screen for Depression Accurately Identify Adults, Including Pregnant and Postpartum Persons, With Depression, in Primary Care or Comparable Settings?

Summary

We included 14 primary studies166179 and 10 existing systematic reviews (ESRs)180189 that examined the test accuracy of screening for depression (Tables 12 and 13). The 14 primary studies covered multiple versions of the Geriatric Depression Scale (GDS); the GDS-15 was the most common version. The standard cutoff of ≥5 (to identify mild to severe depression) had an acceptable balance of sensitivity and specificity with the GDS-15 accurately identifying 94 percent of those with major depression and 81 percent of those without (Figure 6).

This figure is a plot showing the test accuracy of the GDS-15 to detect MDD, at cutoffs of 5, 6, and 7 (key question 2)

Figure 6

Test Accuracy of the GDS-15 to Detect MDD, at Cutoffs of 5, 6, and 7 (KQ2). Abbreviations: CI = confidence interval; MDD = major depressive disorder.

Table 12. Characteristics of Primary Studies Examining Test Accuracy of the Geriatric Depression Scale for Detecting Depression (KQ2).

Table 12

Characteristics of Primary Studies Examining Test Accuracy of the Geriatric Depression Scale for Detecting Depression (KQ2).

Table 13. Characteristics of ESRs of Test Accuracy of Screening Tools to Detect Major Depression.

Table 13

Characteristics of ESRs of Test Accuracy of Screening Tools to Detect Major Depression.

The ESRs we identified covered various versions of the PHQ, 2- and 3-item Whooley screening questions, CES-D, and EPDS (Figure 7). The PHQ-9 correctly identified 85 percent of those with major depression and 85 percent of those without major depression, at the standard cutoff of ≥10, when compared to a semi-structured interview reference standard (Figure 8, for a more detailed depiction of the evidence). At the standard cutoff of ≥2 and when compared to a semi-structured interview, the PHQ-2 was more sensitive than the PHQ-9, correctly identifying 91 percent of people with major depression. But specificity at that cutoff was lower, accurately identifying only 67 percent of people without depression. The Whooley, CES-D, and EPDS demonstrated accuracy comparable to the PHQ-2.

This figure is a shows a summary of included ESR and primary evidence for test accuracy of screening instruments to detect depression (key question 2)

Figure 7

Summary of Included ESR and Primary Evidence for Test Accuracy of Screening Instruments to Detect Depression (KQ2). Abbreviations: CES-D = Center for Epidemiologic Studies Depression scale; CI = confidence interval; EPDS = Edinburgh Postnatal Depression (more...)

This figure shows a plot of test accuracy of PHQ, CES-D, Whooley, and the EPDS from published SERs (key question 2)

Figure 8

Test Accuracy of PHQ, CES-D, Whooley, and the EPDS From Published SERs (KQ2). Abbreviations: CES-D = Center for Epidemiologic Studies Depression scale; CI = Confidence interval; EPDS = Edinburgh Postnatal Depression Scale; MINI = Mini International Neuropsychiatric (more...)

Study Characteristics of Primary Research Studies

Fourteen primary studies (n=8819) were included that provided test accuracy results for the Geriatric Depression Scale (GDS, Table 12).166179 None of these studies were included in the previous review, as the previous review only addressed screening instrument accuracy for pregnant individuals. The GDS-15 was the most common version, but several other versions were also included. Two studies were conducted in the US.167, 174 The others were conducted in Norway, Sweden, the Netherlands, the United Kingdom, Spain, Portugal, Romania, Australia, the Republic of Korea, and Singapore. Sample size ranged from 105 to 4,253; most studies (k=10) analyzed a sample of 500 participants or less.

Ten studies explicitly excluded those with cognitive impairment or those scoring low on cognitive function tests (e.g., MMSE) (Table 12). All studies recruited adults aged 55, 60, or 65 years and older or assisted living residents. Mean age ranged from 69 to 85 years (k=13) (Table 14). Women were represented in higher proportions than men: 50 to 77 percent of participants were women. Race and ethnicity were sparsely reported (k=4). One study conducted in Singapore recruited only participants of Chinese (90%) or Malaysian and South Asian Indian (10%) ethnicity168 and another study in the UK recruited only participants of African Caribbean ethnicity.176 The two other studies reporting race or ethnicity recruited primarily White participants (85% and 90%).167, 174 SES was variably reported; mean years of education ranged from 5.6 to 10 (k=3) and those with 12 or more years of education ranged from 65 to 69 percent (k=2).

Table 14. Participant Characteristics for Studies of Test Accuracy of Depression Screening Instruments (KQ2).

Table 14

Participant Characteristics for Studies of Test Accuracy of Depression Screening Instruments (KQ2).

All studies used a structured or semi-structured interview at no more than two weeks after the screener to diagnose depression. The most common interviews were the Structured Clinical Interview for DSM Disorders (SCID) (k=3), the Diagnostic Interview Schedule (DIS) (k=2), and the Mini International Neuropsychiatric Interview (MINI) (k=2); four studies did not report the specific interview used. The proportion of participants who were diagnosed with major depressive disorder ranged from 3.5 percent to 16.5 percent. Two studies did not use DSM to identify participants with major depression and instead defined depression as any symptom of depression based on ICD-10 (found in 10% of the sample)170 and a depression score of 3 or more on the Geriatric Mental Scale (28.9%).170, 176

Results of Primary Research Studies

GDS-15

Thirteen studies reported the accuracy of GDS-15 to detect major depressive disorder or depression. Reported cutoffs ranged from ≥0 to ≥14, but the most common cutoff was ≥5 (k=8). The cutoff of ≥5 also had the best balance between sensitivity and specificity with a pooled sensitivity of 0.94 (95% CI, 0.85 to 0.98; I2=84.4%; k=7; n=5,655) and pooled specificity of 0.81 (95% CI, 0.70 to 0.89; I2=98.9%) to detect MDD (Figure 6, Appendix E Table 12). At a cutoff of ≥5, sensitivity from seven individual studies ranged from 0.72 to 1.0 and specificity ranged from 0.53 to 0.95. Area under the curve (AUC) for the GDS-15 was reported in eight studies and ranged from 0.79 (95% CI, 0.73 to 0.85) to 0.98 (95% CI, 0.97 to 0.99) (Appendix E Table 12).

One additional study—with an aim to estimate the prevalence of depression in the Netherlands—needed extrapolation of their random sample of participants screening negative back to the full screened sample.173 After that adjustment, the study had the lowest sensitivity to detect MDD at a cutoff of ≥5: 0.58 (95% CI, 0.54 to 0.62). The corresponding specificity was 0.91 (95% CI, 0.90 to 0.91) (Appendix Table C). With this study included in the meta-analysis (k=8; n=11,095), at a cutoff of ≥5, the pooled sensitivity decreased and the pooled specificity increased: they were 0.92 (95% CI, 0.80 to 0.97; I2=94.8%) and 0.83 (95% CI, 0.73 to 0.89; I2=98.7%), respectively (pooled data not shown).

Lower cutoffs yielded higher sensitivity but lower specificity. Higher cutoffs were more variable but tended to yield higher specificities and lower sensitivities (Figure 6, Appendix E Table 12).

GDS-30

Four studies reported the accuracy of GDS-30 to detect major depressive disorder (MDD). Reported cutoffs ranged from ≥7 to ≥17 with only one cutoff used in more than one study (≥17). Sensitivity ranged from 0.55 at a cutoff of ≥11 to 1.0 at a cutoff of ≥15 and ≥17 (95% CI range, 0.38 to 1.0). Specificity was less variable and ranged from 0.67 at cutoff of ≥7 and ≥10 to 0.96 at a cutoff of ≥15 (95% CI range, 0.62 to 0.99) (Appendix E Table 12). With few studies and few cutoffs reported, a consistent relationship between cutoff and test performance was not identified.

Other GDS Versions

Six other versions of the GDS were reported in four studies (Appendix E Table 12). These versions included a revised 10-item version referred to as the GDS-R, and versions with one, four, five, seven, and ten questions. None of these GDS versions were used in more than two studies. In one study, the versions with fewer questions had lower sensitivity and specificity when compared to longer versions of the GDS.171 In another study, the single-item GDS did not perform well (sensitivity 0.18 [95% CI, 0.09 to .34]), but the test accuracy of the GDS-4, GDS-10, and GDS-15 were comparable to each other in that sample.179 The revised version (GDS-R) performed well in comparison to the GDS-15 and GDS-30, but the test performance of the GDS-R has not been replicated in other studies.

Study Characteristics of Existing Systematic Reviews

We included ten ESRs (estimated n=75,000) examining various versions of the PHQ, 2- and 3-item Whooley screening questions, CES-D, and EPDS (Table 13).180189 For the PHQ family of instruments, we included a series of IPD meta-analyses—all conducted by the same group using very similar methods. These reviews examined the accuracy of various versions of the PHQ among adults 18 years and older to screen for major depression. Participants could not be recruited from youth settings, psychiatric settings, or due to their symptoms of depression. Studies taking place in any country were eligible, although the majority took place in countries with a very high human development index. All studies were required to use either a fully structured (including the MINI) or semi-structured interview to determine the diagnosis of major depression; the interview also had to take place within 2 weeks of PHQ administration. The diagnosis of MDD or major depressive episode was determined by DSM or ICD criteria.

Results of Existing Systematic Reviews

PHQ-9
Linear Scoring

The IPD meta-analysis examining the linear scoring algorithm of the PHQ-9 included 100 studies (76 in very high HDI countries) with 44,503 participants.188 Thirty-seven studies took place in primary care or included a general population sample, but the majority took place in inpatient or outpatient specialty care (k=63). Among the 44,503 included participants, 4,541 were diagnosed with major depression (10.2%).190 IPD meta-analyses were conducted for PHQ cutoffs ranging from ≥5 to ≥15, grouped by the reference standard used (semi-structured, fully structured excluding the MINI, or the MINI).190

The standard cutoff for the PHQ-9 to identify depression is ≥10. The IPD meta-analysis confirmed a cutoff of 10 as yielding the best balance of sensitivity and specificity when compared to a semi-structured diagnostic interview (Figure 8). 190 For studies using a semi-structured reference standard (k=47, n=11,234) and a PHQ-9 cutoff of ≥10, sensitivity was 0.85 (95% CI, 0.79 to 0.89) and specificity was 0.85 (95% CI, 0.82 to 0.87) (Figure 8, Appendix E Table 13). For studies that used a fully structured reference standard excluding the MINI (k=20, n=17,167) and a PHQ-9 cutoff of ≥10, sensitivity to detect major depression was 0.64 (95% CI, 0.53 to 0.74) and specificity was 0.88 (95% CI, 0.83 to 0.92) (Figure 8, Appendix E Table 13). For studies that used the MINI for a reference standard (k=33, n=16,102) and a PHQ-9 cutoff of ≥10, the sensitivity to detect major depression was 0.74 (95% CI, 0.67 to 0.79) and specificity was 0.89 (95% CI, 0.86 to 0.91) (Figure 8, Appendix E Table 13). The AUC for all reference standards ranged from 0.84 (fully structured, excluding the MINI) to 0.90 (semi-structured) (Appendix E Table 13). The authors noted that older age and male sex were associated with higher specificity.188

A systematic review reporting the accuracy of the PHQ-9 to identify prenatal or postnatal depression was also identified. This small review (including only 4 studies from the US) reported sensitivity and specificity consistent with the results of the IPD meta-analysis of PHQ-9 among adults 18 years and older.185 Sensitivity to identify prenatal or postnatal depression at a cutoff of ≥10 (k=3) ranged from 0.77 to 0.85 and specificity ranged from 0.62 to 0.84.185

Algorithm

The IPD meta-analysis examining the test accuracy of the PHQ-9 diagnostic algorithm included 54 studies (40 in very high HDI countries) with 16,688 participants.181 Eighteen studies took place in primary care, but the majority took place in inpatient or outpatient specialty care (k=33). Two-thirds of participants (67%; n=11,130) were less than 60 years of age and 57 percent were women (n=9,512). Among the 16,688 included participants, 2,091 were diagnosed with major depression (12.5%). The diagnostic algorithm requires five or more items, each scored with 2 or more points, where at least one of these items is depressed mood or anhedonia. IPD meta-analyses were conducted for the standard algorithm scoring as well as modified scoring (only 1 point required for item 9: “Thoughts that you would be better off dead or of hurting yourself in some way”), grouped by the reference standard used (semi-structured, fully structured excluding the MINI, or the MINI).181

For studies using a semi-structured reference standard (k=27, n=6,331) and the original scoring, sensitivity was 0.57 (95% CI, 0.49 to 0.64) and specificity was 0.95 (95% CI, 0.94 to 0.97) (Figure 8, Appendix E Table 13).181 For studies that used a fully structured reference standard excluding the MINI (k=13, n=7,577) and the original scoring, sensitivity to detect major depression was 0.35 (95% CI, 0.26 to 0.46) and specificity was 0.95 (95% CI, 0.93 to 0.97) (Figure 8, Appendix E Table 13). For studies that used the MINI for a reference standard (k=15, n=2,952) and the original scoring, the sensitivity to detect major depression was 0.51 (95% CI, 0.49 to 0.53) and specificity was 0.97 (95% CI, 0.96 to 0.98) (Figure 8, Appendix E Table 13). The modified scoring resulted in marginally higher sensitivities and similar specificities (Appendix E Table 13).181

PHQ-8

The IPD meta-analysis examining the test accuracy of the PHQ-8 included 54 studies with 16,742 participants.186 The PHQ-8 differs from the PHQ-9 only by omission of Item 9 (“Thoughts that you would be better off dead or of hurting yourself in some way”). Forty-six percent of participants were recruited from primary care and the remaining were recruited from inpatient or outpatient specialty care. Two-thirds of participants were less than 60 years of age (n=11,144; 67%) and 57 percent were women (n=9,552). Among the 16,742 included participants, 2,097 were diagnosed with major depression (12.5%). IPD meta-analyses were conducted for PHQ-8 cutoffs ranging from ≥9 to ≥15, grouped by the reference standard used (semi-structured, fully structured excluding the MINI, or the MINI).186

As found for the PHQ-9, the cutoff yielding the best balance of sensitivity and specificity for the PHQ-8 was ≥10 (Figure 8). For studies using a semi-structured reference standard (k=27, n=6,362) and a PHQ-8 cutoff of ≥10, sensitivity was 0.86 (95% CI, 0.80 to 0.90) and specificity was 0.86 (95% CI, 0.83 to 0.89) (Figure 8, Appendix E Table 13).186 For studies that used a fully structured reference standard excluding the MINI (k=13, n=7,596) and a PHQ-8 cutoff of ≥10, sensitivity to detect major depression was 0.63 (95% CI, 0.52 to 0.72) and specificity was 0.86 (95% CI, 0.81 to 0.90) (Appendix E Table 13). For studies that used the MINI for a reference standard (k=14, n=2,784) and a PHQ-8 cutoff of ≥10, the sensitivity to detect major depression was 0.72 (95% CI, 0.63 to 0.79) and specificity was 0.88 (95% CI, 0.84 to 0.91) (Appendix E Table 13). The AUC for all reference standards ranged from 0.852 (fully structured, excluding the MINI) to 0.930 (semi-structured) (Appendix E Table 13).186

PHQ-4

The IPD meta-analysis examining the test accuracy of the PHQ-4 included 75 studies (51 from very high HDI countries) with 34,698 participants.187 The PHQ-4 is comprised of four items from the PHQ-9: depressed mood, loss of interest/pleasure, low self-esteem/guilt, and psychomotor agitation. Thirty-one studies recruited participants from the general population or primary care. The age of participants ranged from 18 to 98 years with a mean of 48 years and 59 percent were women (n=20,678). Among the 34,698 included participants, 3,392 were diagnosed with major depression (9.8%). IPD meta-analyses were conducted for PHQ-4 cutoffs ranging from ≥1 to ≥12, grouped by the reference standard used (semi-structured, fully structured excluding the MINI, or the MINI).187

The optimal cutoff for the PHQ-4 was identified as ≥4. For studies using a semi-structured reference standard (k=29, n=7,719) and an PHQ-4 cutoff of ≥2, sensitivity was 0.88 (95% CI, 0.81 to 0.93) and specificity was 0.79 (95% CI, 0.74 to 0.83) (Figure 8, Appendix E Table 13). For studies that used a fully structured reference standard excluding the MINI (k=15, n=12,109) and a PHQ-4 cutoff of ≥2, sensitivity to detect major depression was 0.68 (95% CI, 0.56 to 0.78) and specificity was 0.85 (95% CI, 0.78 to 0.90) (Figure 8, Appendix E Table 13). For studies that used the MINI for a reference standard (k=31, n=14,870) and a PHQ-4 cutoff of ≥2, the sensitivity to detect major depression was 0.80 (95% CI, 0.73 to 0.85) and specificity was 0.83 (95% CI, 0.80 to 0.86) (Figure 8, Appendix E Table 13). The AUC was not reported.187

PHQ-2

The IPD meta-analysis examining the test accuracy of the PHQ-2 included 100 studies (74 from very high HDI countries) with 44,318 participants.183 The PHQ-2 is comprised of the first two items of the PHQ-9 (“Little interest or pleasure in doing things” and “Feeling down, depressed, or hopeless”). 14,450 of participants were recruited from primary care (33%), but nearly as many were recruited from inpatient or outpatient specialty care (n=14,063; 32%). Seventy-two percent of participants were less than 60 years of age (n=31,739) and 59 percent were women (n=26,034). Among the 44,318 included participants, 4,572 were diagnosed with major depression (10.3%). IPD meta-analyses were conducted for PHQ-2 cutoffs ranging from ≥1 to ≥6, grouped by the reference standard used (semi-structured, fully structured excluding the MINI, or the MINI).183

Optimal cutoffs for the PHQ-2 have been identified as ≥2 or ≥3, with a cutoff of ≥2 favoring sensitivity over specificity. For studies using a semi-structured reference standard (k=48, n=11,703) and an PHQ-2 cutoff of ≥2, sensitivity was 0.91 (95% CI, 0.88 to 0.94) and specificity was 0.67 (95% CI, 0.64 to 0.71) (Figure 8, Appendix E Table 13).183 For studies that used a fully structured reference standard excluding the MINI (k=20, n=17,319) and a PHQ-2 cutoff of ≥2, sensitivity to detect major depression was 0.82 (95% CI, 0.75 to 0.87) and specificity was 0.71 (95% CI, 0.63 to 0.77) (Figure 8, Appendix E Table 13). For studies that used the MINI for a reference standard (k=32, n=15,296) and a PHQ-2 cutoff of ≥2, the sensitivity to detect major depression was 0.89 (95% CI, 0.84 to 0.92) and specificity was 0.68 (95% CI, 0.64 to 0.73) (Figure 8, Appendix E Table 13). At a cutoff of ≥3, sensitivity among reference standards ranged from 0.53 to 0.72 and specificity ranged from 0.85 to 0.89 (Appendix E Table 13). The AUC for all reference standards ranged from 0.82 (fully structured, excluding the MINI) to 0.88 (semi-structured).183

Sequential Administration of the PHQ-2 and PHQ-9

The systematic review and IPD meta-analysis identified for the PHQ-2 also examined the PHQ-2 in combination with the PHQ-9 (i.e., the PHQ-9 is administered if the PQH-2 is positive).183 Forty-four studies using a semi-structured reference standard with 10,627 participants were included. Of those participants, 1,361 were diagnosed with major depression (12.8).183 Using a cutoff of ≥2 for the PHQ-2 in combination with the PHQ-9 and a cutoff of ≥10, sensitivity to detect major depression was 0.82 (95% CI, 0.76 to 0.86) and specificity was 0.87 (95% CI, 0.84 to 0.89) (Figure 8, Appendix Table E).183 Versus the PHQ-9 alone, the difference in sensitivity was −0.04 (95% CI, −0.09 to 0.01) and the difference in specificity was 0.02 (95% CI, 0.00 to 0.03).183

Whooley

We identified two systematic reviews examining the accuracy of the Whooley questions to screen for major depression, one including all adults180 and one limited to prenatal women.189 Two- and three-item Whooley questions were included.

The systematic review examining all adult populations included 10 studies with 4,618 participants.180 Of those 4,618 participants, 602 had depression (13.0%). The diagnosis of depression had to be made using DSM or ICD criteria. Five of the studies recruited participants from primary care. Nine studies reported the percent of female participants, ranging from 3 to 100 percent of participants (35% overall).180 The pooled sensitivity to detect major depression was 0.95 (95% CI, 0.88 to 0.97) and the pooled specificity was 0.65 (95% CI, 0.56 to 0.74) (Figure 8, Appendix E Table 13).180 Among the five studies conducted in primary care, the pooled sensitivity was 0.96 (95% CI, 0.91 to 0.98) and the pooled specificity was 0.61 (95% CI, 0.48 to 0.73).180

The systematic review examining only prenatal populations included five studies with 1,402 participants (one study conducted in primary care was removed from their main analysis as an outlier and is not discussed).189 Of those participants, 115 were diagnosed with depression (9.6%). The diagnosis of depression was made using DSM-IV and DSM-5 criteria in four studies; one study did not report the diagnostic criteria used. The pooled sensitivity of the Whooley questions to detect major depression was 0.95 (95% CI, 0.81 to 0.99) and pooled specificity was 0.60 (95% CI, 0.44 to 0.74).189

CES-D

We identified one systematic review examining the accuracy of the CES-D.184 The review included 28 studies with 10,617 participants. Studies had to be conducted among participants in primary care or the general population. Eleven studies recruited only older adults and six recruited only adolescents. The diagnosis of major depression was made using DSM or ICD criteria, most commonly using the DIS, SCID, CIDI, and MINI. Of the 10,617 participants, 807 had depression (7.6%; range from individual studies, 1.8 to 37.9%).184

To detect major depression using the standard cutoff of ≥16, the CES-D had a pooled sensitivity of 0.87 (95% CI, 0.82 to 0.91) and a pooled specificity of 0.70 (95% CI, 0.65 to 0.75) (Figure 8, Appendix E Table 13).184 Higher cutoffs (≥20, ≥22) yielded lower sensitivities and higher specificities. The AUC for the CES-D to detect major depression was 0.87. The authors noted that test accuracy was lower among younger age groups, but the age covariate was not statistically significant.184

EPDS

We included one recent systematic review and IPD meta-analysis examining the test accuracy of the EPDS to screen for major depression among pregnant or post-partum persons (within 12 months of giving birth), conducted by the same group who did the IPD meta-analyses for the PHQ instruments.182 Like the others, this review was also limited to participants who were 18 years or older. Participants could not be previously identified as having possible depression or be receiving psychiatric assessment or care. A total of 58 studies with 15,557 participants were included. Of the included 58 studies, 25 were conducted with pregnant persons, 30 with postpartum persons, and three with both. Studies taking place in any country were eligible; three fifths (62%) took place in very high HDI countries (k=36). Among the 15,557 included participants, 2,069 were diagnosed with major depression (13.3%). All studies were required to use either a fully structured (including the MINI) or semi-structured interview to determine the diagnosis of major depression; the interview also had to take place within 2 weeks of EPDS administration. IPD meta-analyses were conducted for EPDS cutoffs ranging from ≥7 to ≥15, grouped by the reference standard used (semi-structured, fully structured excluding the MINI, or the MINI).182

The IPD meta-analysis determined that an EPDS cutoff of ≥11 yielded the best balance of sensitivity and specificity (Figure 8).182 For studies using a semi-structured reference standard (k=36, n=9,066) and an EPDS cutoff of 11, sensitivity was 0.81 (95% CI, 0.75 to 0.87) and specificity was 0.88 (95% CI, 0.85 to 0.91) (Figure 8, Appendix E Table 13). With the same reference standard and an EPDS cutoff of ≥12, which is a standard cutoff, sensitivity to detect major depression was 0.75 (95% CI, 0.67 to 0.81) and specificity was 0.92 (95% CI, 0.89 to 0.94) (Appendix E Table 13). Sensitivity and specificity estimates varied sightly with the use of the MINI and other fully structured reference standards, but generally remained in the same ranges. The AUC for all reference standards ranged from 0.890 (MINI) to 0.924 (fully structured, excluding the MINI) (Appendix E Table 13). The authors also noted that the test accuracy did not significantly change when EPDS administration occurred in the postpartum or pregnant period.182

KQ3. What Are the Harms Associated With Screening for Depression in Primary Care or Comparable Settings in Adults, Including Pregnant and Postpartum Persons?

Only one depression screening study reported on harms (Table 8).159 This study, conducted in Hong Kong among post-partum patients, reported that there were no adverse events in either group. Across all depression screening studies included for KQ1, there was no pattern of effects indicating that screening might paradoxically worsen any outcomes the interventions were aiming to benefit (Appendix E Table 14).

KQ4. Does Treatment of Depression (Psychotherapy or Pharmacotherapy) Result in Improved Health Outcomes in Adults, Including Pregnant and Postpartum Persons?

Summary

We included 39 ESRs (reported in 41 publications) of treatment for depression, 30 addressing psychological treatment (Table 15)191220 and ten ESRs addressing pharmacologic treatment (Table 16).212, 221232 One ESR reports both psychological and pharmacotherapy treatment benefits and results are discussed under the appropriate sections.212 Psychological treatment improved depression outcomes (Figure 9). This was the case in both broad analyses that included a wide range of populations and specific interventions, and in analyses of some important specific populations, including older adults, perinatal populations, and primary care patients. For example, the broadest analysis, which included any type of psychological treatment compared to any kind of control condition, measuring the depression outcome immediately post-treatment (typically 2 to 6 months post-baseline), had a standardized mean difference (SMD) of −0.72 (95% CI, −0.78 to −0.67; k=385, N not reported, but estimated at approximately 33,000),195 suggesting a moderate to large effect size. When limited to studies in primary care patients, the effect was smaller but clearly statistically significant (SMD, −0.42 [95% CI, −0.56 to −0.29; k=59, N not reported]). Evidence also indicated that psychological treatment for depression improved other outcomes, including anxiety symptoms, hopelessness, quality of life, social functioning, parental functioning, and mental health in offspring.

This figure shows a forest plot of standardized mean differences between groups in depression symptom severity by intervention type, population, control group type, and followup for psychological treatment of depression (key question 4).

Figure 9

Forest Plot of Standardized Mean Differences Between Groups in Depression Symptom Severity by Intervention Type, Population, Control Group Type, and Followup for Psychological Treatment of Depression (KQ4). Abbreviations: BAT = Behavioral Activation Therapy; (more...)

Table 15. Characteristics of Existing Systematic Reviews Included to Address the Benefits of Psychological Treatment of Depression (KQ4).

Table 15

Characteristics of Existing Systematic Reviews Included to Address the Benefits of Psychological Treatment of Depression (KQ4).

Table 16. Existing Systematic Reviews Included to Address the Benefits of Pharmacologic Treatment of Depression (KQ4).

Table 16

Existing Systematic Reviews Included to Address the Benefits of Pharmacologic Treatment of Depression (KQ4).

Data were limited for populations who were socially or economically disadvantaged or in specific racial or ethnic groups, however the limited evidence supported benefits of psychological treatment in these populations as well. For example, an analysis of five trials among people described as having low socioeconomic status found reduced depressive symptoms at up to 12 weeks post-baseline (SMD, −0.66 [95% CI, −0.92 - −0.41; k=5, N=424]),212 and a separate analysis found no differences in effect size between studies limited to race or ethnic “minority” populations vs not limited to these population.197

For antidepressant medications, pooled effects consistently demonstrated increased rates of remission and response to treatment, and small but statistically significant reductions in depressive symptom severity in the short term (typically 8 weeks, Figures 1012). For example, fluoxetine, which had the largest body of evidence with 117 studies, was associated with a small reduction in symptom severity (SMD, −0.23 [95% CI, −0.28 to −0.19]), a 46 percent increase in the odds of remission (OR, 1.46 [95% CI, 1.34 to 1.60]) and a 52 percent increase in the odds of treatment response (OR, 1.52 [95% CI, 1.40 to 1.66], number of studies and individuals included in each specific analysis was not reported, nor were I2 values).224 However, little information was available on the longer-term impact of antidepressants in the synthesized literature, and information was absent or extremely limited on the benefits of pharmacologic treatment in specific a priori populations of interest.

This figure shows a forest plot of group differences in depression remission with pharmacological treatment of depression compared to placebo (key question 4).

Figure 10

Forest Plot of Group Differences in Depression Remission With Pharmacological Treatment of Depression Compared to Placebo (KQ4). Abbreviations: CI = confidence interval; ES = effect size; NR = not reported; OR = odds ratio, PC = primary care; RR = relative (more...)

This figure shows a forest plot of group differences in depression response with pharmacological treatment of depression compared to placebo (key question 4).

Figure 11

Forest Plot of Group Differences in Depression Response With Pharmacological Treatment of Depression Compared to Placebo (KQ4). Abbreviations: CI = confidence interval; ES = effect size; NR = not reported; OR = odds ratio, RR = relative risk.

This figure shows a forest plot of standardized mean differences between groups in depression symptom severity for pharmacological treatment of depression compared to placebo (key question 4).

Figure 12

Forest Plot of Standardized Mean Differences Between Groups in Depression Symptom Severity for Pharmacological Treatment of Depression Compared to Placebo (KQ4). Abbreviations: CI: confidence interval; FUP = followup; NR = not reported; OR = odds ratio, (more...)

Psychological Treatment of Depression

Study Characteristics

We examined the benefits of treatment for depression using ESRs. We included 30 reviews of psychological treatment, which covered a wide range of specific intervention approaches and outcomes (Table 15, Appendix E Table 15).191220 Two of these reviews conducted meta-analyses using individual patient data,204, 205 enabling them to examine effect modification by patient characteristics. Most other reviews conducted traditional study-level meta-analyses and provided information on effect modification by key study and intervention characteristics. Nine of the reviews utilized the Cuijpers database described in the Methods section, including up to 309 trials with control conditions in a given analysis, and approximately 33,000 participants.194198, 204, 205, 216

All of the included reviews were either limited to studies of people meeting some kind of depression criteria or reported results separately for studies that were limited to those meeting depression-related criteria. Most of the included reviews were limited to studies among adults, generally defined as 18 years and older. Some reviews focused on older adults200, 202, 214, 217 (lowest age ranging from 50 to 65 years) and six reviews focused on perinatal patients.203, 206, 208, 211, 213, 220 Other reviews focused on rural settings,215 participants who were socially disadvantaged,212 participants who were culturally and linguistically different from those for whom the intervention was originally designed,201 or had samples that were primarily comprised of Hispanic/Latino,193 Hispanic/Latino immigrant,210 or Black or Hispanic/Latino211 participants. Four ESRs focused on studies conducted among people recruited from primary care settings, in general207, 218, 219 and older adults.202

Most reviews included psychological interventions without restriction to specific therapeutic approaches, however we also retained reviews that were limited to CBT-based interventions,203, 210, 213215, 218, 220 since this was the most widely studied therapeutic approach. Five reviews focused on electronically delivered interventions (e.g., via websites or apps),200, 204, 205, 207, 217 and two examined telemedicine in general192 and perinatal populations.208 We rated all included reviews as good quality. All were published in 2015 or later, searched multiple databases with what appeared to be comprehensive search strategies; had explicit and relevant selection criteria; indicated some type of standard quality appraisal of included studies, and, if applicable, used valid meta-analytic methods.

Results

Detailed results for all outcomes are reported in Appendix E Tables.

Depression Symptom Severity

Most reviews explored either continuous measures of depression symptom severity, or used the studies’ main outcome, which was typically a continuous measure of depression symptom severity but could also include some dichotomous outcomes that were converted to standardized effect sizes. Standardized effect sizes are shown in Figures 9 and 13, and Appendix E Tables 16 and 17. The broadest analysis, including any type of psychological treatment compared to any kind of control condition, with the main depression outcome measured immediately post-treatment, had a standardized mean difference (SMD) of −0.72 (95% CI, −0.78 to −0.67; k=385, N not reported, but estimated at approximately 34,000, Figure 9),195 suggesting a moderate to large effect size. An analysis in the same review that was limited to CBT treatment reported a very similar effect size (SMD, −0.73 [95% CI, −0.80 to −0.65]; k=205, N not reported).195 Interpersonal therapy (IPT), problem-solving therapy (PST), behavioral activation therapy (BAT), Life review, and “Third wave” cognitive therapies such as mindfulness-based approaches and Acceptance and Commitment therapy (ACT) all had SMDs of −0.60 or larger at post-treatment, with 19 to 30 studies in the analysis, as reported in the same ESR.195

This figure shows a forest plot of standardized mean differences between groups in depression symptoms severity by study, intervention, and population characteristics where effect modification was assessed for psychological treatment of depression (key question 4).

Figure 13

Forest Plot of Standardized Mean Differences Between Groups in Depression Symptoms Severity by Study, Intervention, and Population Characteristics Where Effect Modification Was Assessed for Psychological Treatment of Depression (KQ4). Abbreviations: CI (more...)

Effects in patient populations specified a priori in our Research Plan also demonstrated greater symptom reduction with psychological interventions compared to control groups. Among perinatal patients (pregnant or postpartum), CBT was associated with an effect similar to the overall effect size at post-treatment followup(SMD of −0.69 (95% CI, −0.83 to −0.55; k=54, N=5,393, Figure 9).220 An examination of the effect of internet-based CBT in postpartum patients showed a similar effect size (SMD, −0.55 [95% CI, −0.76 to −0.34]; k=6, N=635, Figure 9).213 A review focused on older adults treated with CBT reported an SMD of −0.63 at post-treatment (95% CI, −0.76 to −0.49; k=52, N=2,925, Figure 13).214 Psychological interventions also reduced depressive symptoms in studies of socioeconomically disadvantaged persons in the short term (SMD, −0.66 [95% CI, −0.92 to −0.41]; k=5, N=not reported), however this effect was not statistically significant at long-term followup (SMD, −0.53 [95% CI, −1.12 to 0.05]; k=4, N=not reported).212 Effect sizes in studies among patients recruited from primary care tended to be smaller than effect sizes reported for broad analyses, not limited to studies among primary care patients. However, effect sizes among studies of primary care patients demonstrated a statistically significant benefit in most cases. For example, the SMD for any psychological treatment among primary care patients compared to any control condition was −0.42 (95% CI, −0.56 to −0.29; k=59, N not reported).219 The effect was smaller for older adult primary care patients being treated with CBT at 26-week followup (SMD, −0.21 [95% CI, −0.40 to −0.03]; k=4, N=445)202 but was not statistically significant when pooling the post-treatment timepoints (SMD, −0.16 [95% CI, −0.34 to 0.02]; k=4, N=274).202 Narrative syntheses also reported generally positive effects of various psychological treatment approaches for people in rural settings and Hispanic/Latino patients, but fewer statistically significant group differences in four studies each of CBT and interpersonal therapy among Black and Hispanic/Latino perinatal patients (Appendix E Table 18).211

Depression Remission and Response

Fewer reviews reported pooled effects for depression remission203, 204 and response to treatment194, 204, 205 (Table 17, Appendix E Table 19). Analyses of remission demonstrated a two-fold or more increase in the odds of remission, among studies focused on either guided internet-based interventions or on CBT among postpartum patients. Similarly, all three analyses examining response to treatment indicated a benefit, including at followup of more than six months (OR, 1.92 [95% CI, 1.60 to 2.31]; k=55, N not reported, I2=65) and more than one year (OR, 1.59 [95% CI, 1.14 to 2.21; k=11, N not reported, I2=55).194

Table 17. Meta-Analysis Results for Depression Remission and Depression Response in ESRs of Psychological Treatment of Depression (KQ4).

Table 17

Meta-Analysis Results for Depression Remission and Depression Response in ESRs of Psychological Treatment of Depression (KQ4).

Other Outcomes

Reviews reported that depression treatment improved a number of other outcomes, including anxiety symptoms, hopelessness, quality of life, social functioning, days of sickness absence, parental functioning, and mental health in offspring (Appendix E Table 20), although some of these outcomes were sparsely reported and some effects were small. Findings for work functioning, anxiety symptom severity among postpartum patients, and suicidality did not demonstrate a statistically significant benefit, but were reported in only one,209 two,203 and four194 trials, respectively (Appendix E Table 20).

Effect Modification and Findings in Specific Populations

We included effect modification analyses covering a wide range of study, intervention, and patient characteristics (Figure 13, Appendix E Tables 17 and 18). We extracted detailed results for effect modification of depression symptom severity, the most commonly reported outcome. Narrative summaries were extracted for other depression-related outcomes. Statistically significant effect modification was found for variation in study characteristics by age, the presence of medical comorbidities, perinatal status, format, sessions per week, and some control group types. Among traditional, study-level meta-analyses, effects were smaller in studies limited to:

  • General and older adults, compared to students197
  • People with medical comorbidities197
  • Perinatal patients197
  • Interventions delivered in “Other/mixed” format compared to individual, group, or guided self-help formats197
  • One or fewer sessions per week194
  • Active control group (e.g., education group) compared with usual care or wait-list controls214
  • Pill placebo control groups197
  • US-based specialty mental health usual care control group, compared with specialty mental health usual care in The Netherlands198
  • UK or Netherlands-based usual primary care, compared with US-based usual primary care198

In most cases, however, psychological interventions still had a statistically significant benefit even when the effect was smaller than others in the stratified analyses. Reviews with study-level meta-analyses found no effect modification related to gender composition (women only vs. women and men), race/ethnicity composition (limited to a race or ethnic “minority” group vs. not limited by race or ethnicity), recruitment setting (primary care, other medical, community, or other), usual care setting when combining studies from all countries, type of control group aside from pill placebo and active controls (e.g., wait list, usual care, no treatment), depression inclusion criteria, intervention format (individual, group, or guided self-help), and number of sessions. The individual patient data meta-analyses of internet-based interventions examined a wide range of individual-level characteristics.204, 205 These reviews found only three characteristics that were associated with effect size: higher baseline symptom severity, older age, and being native-born to country where the study took place were all associated with larger effects for guided (but not unguided) internet-based interventions.

There was indication of publication bias in this literature. One review contacted investigators of studies in an NIH grants database that had no published results and requested the unpublished results. They then compared the pooled effect with and without the results from the unpublished studies.199 The standardized mean difference (SMD) was reduced from −0.52 (95% CI, −0.68 to −0.37, k=20, N not reported) among published studies to −0.39 (95% CI, −0.70 to −0.08, k=26, N not reported) when the unpublished studies were included in the analysis. Additionally, a separate ESR used the Duval and Tweedie trim and fill procedure to estimate an effect adjusted for publication bias.197 This procedure fills in “missing” studies that are hypothesized to exist but be unpublished based on the funnel plot of the data. According to this analysis, the SMD adjusted for publication bias was estimated to be −0.50 (95% CI, −0.56 to −0.44), compared with the main analysis effect of −0.71 (95% CI, −0.77 to −0.66, k=332). Thus, psychological interventions appear to reduce depression symptom severity, even taking into account the probable presence of publication bias.

Pharmacologic Treatment of Depression

Study Characteristics

We included ten ESRs of pharmacologic treatment (Table 16, Appendix E Table 21), covering all antidepressants commonly used in the US.212, 221, 222, 224, 226229, 231, 232 Our primary data source for general adult populations was an exhaustive systematic review with a network meta-analysis of antidepressants conducted by Cipriani and colleagues.224 This review included 522 trials, covering 814 different active treatment groups (N=116,477). We focused on placebo comparisons, although this review did not report the number of studies included in each specific placebo comparison. Therefore, we reported the total number of studies included in the review for each agent in our forest plots and tables. One review each covered older,227 perinatal,232 primary care,221 and socially or economically disadvantaged populations.212 Other reviews reported outcomes not addressed by the Cipriani review, including quality of life and social functioning in older adults,227 occupational functioning,228 and cognitive functioning as measured by the Digit Symbol Substitution Test.222 One review conducted individual patient-level analysis of the items of the Hamilton Depression Rating Scale to determine whether duloxetine has greater or lesser impact on specific symptoms.229 Finally, one review reported on the effect of combined pharmacologic and psychological treatment compared to placebo.226

We rated seven of the included reviews as good quality.212, 221, 222, 224, 226, 227, 232 The good quality reviews all were published in 2015 or later, searched multiple databases with what appeared to be comprehensive search strategies; had explicit and relevant selection criteria; indicated some type of standard appraisal of included studies, and, if applicable, used valid meta-analytic methods. The ESRs rated as fair were downgraded because they did not describe conducting risk of bias assessment for the studies included in their reviews.228, 229, 231 We included these studies, however, because they either had some risk of bias safeguard (e.g., requiring double-blind design),228 or conducted individual patient data meta-analysis which we judged to be less affected by typical risk of bias threats in component studies.229, 231

Most of the reviews made efforts to search for unpublished data, typically by searching conference abstracts or requesting information from the regulatory agencies or pharmaceutical companies. For example, the Cipriani review reported manual searching of trial registries and websites of drug approval agencies for unpublished studies. In addition, they contacted all of the pharmaceutical companies marketing antidepressants to ask for supplemental unpublished information about both premarketing and post-marketing studies. Finally, they also contacted study authors and drug manufacturers to supplement incomplete reports of the original papers or provide data for unpublished studies.224

Results

Detailed results for all outcomes are reported in Appendix E Tables.

Depression Outcomes

The stated primary outcome of the Cipriani review was response to treatment, typically reported as a 50 percent reduction in symptom severity measures such as the HAM-D or the MADRAS. Other depression outcomes examined were standardized mean differences of continuous symptoms severity measures and remission. In broad analyses unrestricted by population, all antidepressant agents demonstrated statistically significantly greater improvements than placebo for all three depression outcomes (Figures 1012; Appendix E Tables 22–24). At 8 week followup (or the closest available), SMDs ranged from −0.17 (95% CI, −0.26 to −0.08, 17 studies included in the ESR) to −0.50 (95% CI, −0.85 to −0.15; 1 RCT, n=63), consistent with small effects for symptom severity.224 The number of included trials ranged from one to an estimated 117, including non-placebo comparisons. The odds of remission were increased by a range of 23 percent to 252 percent and the increased odds of treatment response ranged from 37 percent to 213 percent.

The agent with the largest body of evidence was fluoxetine, with 117 trials (N not reported).224 Fluoxetine was associated with an SMD of −0.23 (95% CI, −0.28 to −0.19) for depression symptoms severity, a 46 percent increase in the odds of remission (OR, 1.46 [95% CI, 1.34 to 1.60]) and a 52 percent increase in the odds of treatment response (OR, 1.52 [95% CI, 1.40 to 1.66], number of studies and individuals included in each specific analysis was not reported, nor were I2 values).224 A review addressing combination treatment (pharmacologic and psychological) also found that depression symptoms were reduced with combination treatment (SMD, −0.46 [95% CI, −0.70 to −0.21], 6 RCTs, N not reported; I2, 17%).226

Among analyses limited to specific populations, findings were more variable and confidence intervals were generally wide, reflecting the small number of studies for most analyses. In a review of RCTs among primary care patients, SSRIs demonstrated a benefit for both symptom severity (SMD, −0.27 [95% CI, −0.38 to −0.16], number of studies, N, and I2 not reported) and remission (RR, 1.33 [95% CI, 1.20 to 1.48], 7 RCTs, N=1652; I2 not reported).221 In a review of trials in older adults, duloxetine both had the most evidence (4 RCTs, N=1,347) and the most consistent finding of benefit across depression outcomes, while fluoxetine was the least promising.227 In analyses among populations determined to have low socioeconomic status, one to three RCTs found greater improvements with paroxetine and with combination treatment compared to placebo.212

We found little information in the recent synthesized literature about longer-term effects. The review that was focused on interventions for depression among low socioeconomic status populations reported on long-term outcomes, which they defined as outcomes measured three or more months after the intervention was completed.212 This review found one such study reporting that paroxetine was associated with lower symptom severity than placebo at 6 months’ followup, 4 months after treatment had been completed (SMD, −0.39 [95% CI, −0.74 to −0.04]). This review also reported results of a meta-analysis of three studies of combination treatment. The long-term pooled effect was not statistically significant (SMD, −0.47 [95% CI, −0.97 to 0.03], I2=85%, N=482), although the short-term finding was statistically significant for this group of three studies (SMD, −0.68 [95% CI, −0.97 to −0.40], I2=56%, N=491). The review focused on older adults included one placebo-controlled trial of duloxetine that reported results longer than 12 weeks’ followup.227 This study reported greater symptoms reduction at long-term followup (SMD, −0.39 [95% CI, −0.64 to −0.14]) but the remission benefit was no longer statistically significant (RR, 1.57 [95% CI, 0.95 to 2.59]).227

Other Outcomes

Aside from suicide-related outcomes, which are discussed under KQ5 (harms of treatment), we found very limited information on other outcomes reported in the synthesized literature of antidepressants. One review found no improvement in cognitive function as measured by the Digit Symbol Substitution Test for citalopram, duloxetine, escitalopram, nortriptyline, or sertraline, but found a small benefit for vortioxetine relative to placebo (SMD, 0.34 [95% CI, 0.18 to 0.49], 3 RCTs, I2 and N not reported, Appendix E Table 25).222 No benefits were seen for quality of life in the review focused on older adults, but one RCT of bupropion reported improved social functioning (SMD, −0.26 [95% CI, −0.06 to −0.45]).227

Effect Modification and Findings in Specific Populations

Detailed results are reported in Appendix Tables 26 and 27. The main review by Cipriani and colleagues examined some important potential effect modifiers.224 They found larger effects in studies with earlier publication dates for several antidepressants, and also larger effects in smaller studies. They also found an association between baseline symptom severity and effect size, however this analysis was at high risk of ecological bias and is better addressed using individual patient data. Finally, they also found no association between effect size and industry sponsorship or with publication status (published vs. unpublished), however they reported having limited ability to detect the impact of these characteristics. An individual patient data meta-analysis examined effect modification for duloxetine.229 This review found a greater reduction in suicidality with duloxetine among adults age 25 and older compared to those age 18–24, relative to placebo; duloxetine demonstrated a statistically significant benefit only in adults age 25 and older. Additionally, this review found no association between degree of improvement in depression symptoms and either baseline symptom severity or severity of side effects. A separate individual patient data meta-analysis found no association between baseline symptoms severity and effect size.231

KQ5. What Are the Harms of Treatment of Depression (Psychotherapy or Pharmacotherapy) in Adults, Including Pregnant and Postpartum Persons?

Summary

We included four ESRs addressing harms of psychological interventions (Table 18).233236 We included one cohort study237 and 22 ESRs (in 29 publications) addressing harms of pharmacologic treatment for depression (Tables 19 and 20).224, 227, 232, 238255 Psychological interventions did not increase the risk of harm, as measured by deterioration of depressive symptoms.

Table 18. Characteristics of ESRs Addressing Harms of Psychological Treatment of Depression (KQ5).

Table 18

Characteristics of ESRs Addressing Harms of Psychological Treatment of Depression (KQ5).

Table 19. Characteristics of ESRs in General Adult Populations Addressing Harms of Pharmacologic Treatment of Depression (KQ5).

Table 19

Characteristics of ESRs in General Adult Populations Addressing Harms of Pharmacologic Treatment of Depression (KQ5).

Table 20. Characteristics of ESRs Limited to Perinatal or Older Adult Populations Addressing Harms of Pharmacologic Treatment of Depression (KQ5).

Table 20

Characteristics of ESRs Limited to Perinatal or Older Adult Populations Addressing Harms of Pharmacologic Treatment of Depression (KQ5).

For pharmacologic treatment, there was clear evidence that those receiving antidepressants were at a higher risk of dropout because of adverse events (Figure 14),224 which likely reflect the increased risk of non-serious adverse events.243 There was also some evidence of an increased risk of serious adverse events with SSRI use (OR, 1.39 [95% CI 1.12 to 1.72], 44 RCTs, N not reported, I2=0%, Figure 15).243 The absolute risk of serious adverse events appears to be relatively low, however, and evidence for specific serious adverse events other than suicide was very limited. There were too few suicide deaths to determine the association between antidepressant use and suicide death, but both RCT and observational evidence supported a small absolute increase in risk of suicide attempts with second generation antidepressant use among adults up to age 65 (Figure 16). For example, a review of FDA regulatory data indicated a 53 percent increase in the odds of a suicide attempt at post-treatment with the use of second-generation antidepressants (OR, 1.53 [1.09 to 2.15]; N= 41,861; 0.7% of antidepressants users had a suicide attempt vs 0.3% of placebo users.256 Evidence on other outcomes was limited and generally included only observational evidence.

This figure shows a forest plot of group differences in dropout due to adverse events with pharmacological treatment of depression compared to placebo (key question 5)

Figure 14

Forest Plot of Group Differences in Dropout Due to Adverse Events With Pharmacological Treatment of Depression Compared to Placebo (KQ5). Abbreviations: CI = confidence interval; NR = not reported; RR = relative risk; SMD = standardized mean difference; (more...)

This figure shows a forest plot of group differences in any serious adverse events with pharmacological treatment of depression compared to placebo (key question 5).

Figure 15

Forest Plot of Group Differences in Any Serious Adverse Events With Pharmacological Treatment of Depression Compared to Placebo (KQ5). Abbreviations: AE = adverse event; ES = effect size; CI = confidence interval; NR = not reported; OR = odds ratio; RR (more...)

This figure shows a forest plot of group differences in suicide-related outcomes with pharmacological treatment of depression compared to placebo (key question 5).

Figure 16

Forest Plot of Group Differences in Suicide-Related Outcomes With Pharmacological Treatment of Depression Compared to Placebo (KQ5). *The following effects are ORs rather than RRs: suicide deaths with 2nd generation antidepressants; suicide attempts with (more...)

Psychological Treatment of Depression

Study Characteristics and Results

Four ESRs reported on adverse outcomes of psychological treatment of depression, including an estimated 63 RCTs (Tables 18 and 21, Appendix E Table 28).233236 Three of the ESRs included studies that had reported deterioration rates with any psychological treatment,233 self-guided internet-based CBT,236 and guided internet-based interventions.234 Deterioration rates were either lower with psychological interventions or did not differ statistically from control groups. In the broadest analysis, which included RCTs of any type of psychological treatment that reported deterioration rates, participants in psychological interventions had a 61 percent lower likelihood of deterioration (RR, 0.39 [95% CI, 0.27 to 0.57]; 23 RCTs, N not reported; I2, 0%). A separate review of psychological interventions among older adults reported that none of the 14 included trials reported safety data.235

Table 21. Adverse Events Reported in ESRs of Psychological Treatment of Depression (KQ5).

Table 21

Adverse Events Reported in ESRs of Psychological Treatment of Depression (KQ5).

Pharmacologic Treatment of Depression

Study Characteristics

We included 22 ESRs that addressed harms of antidepressant use (Tables 19 and 20, Appendix E Table 29).224, 227, 232, 238255 We estimated that these reviews collectively included approximately 522 RCTs and 175 observational studies. Three of these reviews covered perinatal patients,232, 257 four focused on older adults,227, 247, 252, 254 and the remaining included studies of adults of any age. Sixteen of the reviews were rated as good quality and six were rated fair, down-graded for lack of risk of bias assessment238, 240, 245, 251, 257 or for only searching one database.255 Eight of the reviews addressed the question of whether antidepressant use increased risk of suicide, primarily focused on SSRIs and other second generation agents. 237, 238, 241, 243, 247, 251, 256 We also included a large cohort study examining suicide risk that was published after the ESR we included that examined observation evidence for suicide-related outcomes (Table 22).237

Table 22. Results From Observational Studies of Suicide Attempt Risk With Pharmacologic Treatment for Depression Published After Included ESRs (KQ5).

Table 22

Results From Observational Studies of Suicide Attempt Risk With Pharmacologic Treatment for Depression Published After Included ESRs (KQ5).

Results

Detailed results are shown in Appendix E Tables.

Any Adverse Events, Dropout, and Serious Adverse Events

Seventeen ESRs considered non-suicidal harms of pharmacologic treatment.224, 227, 232, 238, 239, 242250, 252254 A broad review examining RCTs of SSRI use compared to placebo did not report an overall estimate of the risk of any adverse event, but they examined a large number of specific non-serious events.243 The most commonly reported events with higher rates among SSRI users were abnormal ejaculation, tremor, anorexia, nausea, somnolence, sweating, asthenia, diarrhea, constipation, insomnia, dizziness, dry mouth, libido decreased, sexual dysfunction, appetite decreased, fatigue, vomiting or upset stomach, flu syndrome, drowsiness, blurred/abnormal vision or dry eyes, nervousness, back pain, headache, dyspepsia, weight loss. These analyses included up to 78 studies per outcome (Appendix E Table 30 for narrative summary). Neither RCT nor observational cohort evidence indicated any clear difference between the presence of the composite outcome of any adverse events for antidepressant treatment compared to placebo in older adults (Figure 17, Appendix E Table 31).

This figure shows a forest plot of group differences in any adverse events with pharmacological treatment of depression compared to placebo (key question 5).

Figure 17

Forest Plot of Group Differences in Any Adverse Events With Pharmacological Treatment of Depression Compared to Placebo (KQ5). Abbreviations: CI = confidence interval; NR = not reported; RR = relative risk; SNRI = Serotonin and norepinephrine reuptake (more...)

RCT evidence indicated no pattern of increased dropout for any reason with antidepressants, compared to placebo (Figure 18, Appendix E Table 32).224 However, RCT evidence showed that whether assessed as a class (SSRI, or SNRI) or as a specific antidepressant, receiving antidepressant treatment increased the risk of dropout due to adverse events (Figure 14, Appendix E Table 33). Nearly every agent tested had a statistically significant increase in dropout due to adverse events among general adult populations, with ORs ranging from 1.64 (95% CI, 1.25 to 2.14, 15 RCTs, N and I2 not reported) for Vortioxetine to 4.44 (95% CI, 3.07 to 6.50, 20 RCTs, N and I2 not reported) for Clomipramine.224 For older adults, SSRIs as a class increased the risk of dropping out because of adverse events nearly 3-fold (RR, 2.90 [95% CI, 1.16 to 5.06]; 3 RCTs, N=887, I2 not reported), and SNRIs similarly increased the risk nearly two-fold (RR, 1.85 [95% CI, 1.05 to 3.27]; 3 RCTs, N=812, I2 not reported).252

This figure shows a forest plot of group differences in dropout for any reason with pharmacological treatment of depression compared to placebo (key question 5).

Figure 18

Forest Plot of Group Differences in Dropout for Any Reason With Pharmacological Treatment of Depression Compared to Placebo (KQ5). Abbreviations: CI = confidence interval; NR = not reported; RR = relative risk.

The association of antidepressant use with any serious adverse events was less clear (Figure 15, Appendix E Table 34). The broadest review, covering RCTs in adults reporting serious adverse events of SSRI use compared to placebo, suggested a nearly 40 percent increase in odds with antidepressant use (OR, 1.39 [95% CI 1.12 to 1.72]; k=44, N=NR, I2 not reported).243 Serious adverse events were relatively rare; 239/8242 SSRI participants (2.7%) had serious adverse events, compared to 106/4956 (2.1%) of placebo participants. The authors of this review rated the strength of this evidence as very low due to high risk of bias of the included studies, which they note is likely to overestimate benefits and underestimate harms. In a separate review addressing serious adverse events in older adults, only one to two studies reported serious adverse events for any specific agent (N = 122 to 607) and findings were imprecise, with wide ranging confidence intervals crossing the null. A third review examined the impact of pharmacologic interventions in perinatal patients.232 Five RCTs and 70 observational studies were included, reporting on 27 potential serious adverse events, including maternal, birth, and infant/child harms. The authors judged the certainty of evidence to be insufficient or low in all instances, including for congenital and cardiac anomalies (graded insufficient), primarily because of lack of control for confounding. Their findings indicated small absolute risk differences for all adverse events.

Suicide Death

Evidence for the impact of antidepressant use on suicide death was limited by the small number of events (Figure 16, Appendix E Table 35). The review with the most evidence involved an analysis of FDA regulatory data of 14 antidepressants, and 41 suicide deaths altogether.256 In this review, there was a statistically non-significant 74% increase in risk of suicide with antidepressants (RR, 1.74 [95% CI, 0.78 to 3.90]; 0.12% [37/31781] died from suicide among those taking antidepressants, 0.04% [4/10080] with placebo). Other reviews included only seven suicide deaths (three with SSRI use, four with placebo)243 and eight suicide deaths (seven of eight deaths were among those taking second generation antidepressants, one with placebo).238 A review of cohort studies focused on older adults found only two studies examining suicide deaths.247 One of the two cohort studies in this review was limited to people with depression, and showed a statistically non-significant effect in the direction of benefit (RR, 0.64 [95% CI, 0.38 to 1.07], n=3,325,567 prescriptions). The other cohort study in this review, among people taking SSRIs for any indication, found an increased risk of suicide death (RR, 4.87 [95% CI, 1.99 to 11.94], n=241,754 patients).

Suicide Attempts

Evidence suggested a very small increased risk of suicide attempts with antidepressant use (Figure 16, Appendix E Table 35).238, 243, 256 For example, a review of FDA regulatory data found a 53 percent increase in the odds of a suicide attempt at post-treatment with the use of second generation antidepressants (OR, 1.53 [95% CI, 1.09 to 2.15], 206/31,781 [0.7%] of antidepressant users had a suicide attempt vs 28/10,080 [0.3%] of placebo users).256 However, given how rarely suicide attempts occur in clinical trials, this is still based on a very small number of events. Observational evidence supported the RCT-based findings. A review of cohort and case-control studies examining the impact of second generation antidepressants found a statistically significant increase in the risk of the composite outcome of any suicide death or suicide attempt (RR, 1.29 [95% CI, 1.06 to 1.57]; k=27, N and I2 not reported).241 This finding held when limited to studies with low risk of bias, studies that adjusted for covariates, and studies that declared no fCOI. The increased risk was also statistically significant when limited to people with MDD, when any indication was allowed, and among studies conducted outside of North America. However, there was a statistically significant reduction in risk among studies conducted in North America and no association found when limited to studies with a financial COI declared. In a cohort study (N=358,351) using claims data,237 there was no association between antidepressant dispensing and a suicide attempt leading to a medical encounter (Table 22). This study controlled for a wide range of patient-, physician-, and market-level variables. Effect sizes for SSRI, SNRI, and tricyclic antidepressants (TCA) dispensings had very wide confidence intervals but trended in the direction of benefit; however the association was in the direction of increased risk of a suicide attempt for people who had dispensings of two or more different kinds of antidepressants.237

One IPD MA of suicidal ideation as measured by the HAM-D suicide item found that, among adults age 25 and older, the reduction in mean suicidality ratings was larger in patients receiving SSRI from week 1 and onwards, relatively to placebo.251 In young adults (age 18–24 years), those given an SSRI were at higher risk for worsening of suicidal ideation (in the unadjusted analysis) or emergent suicidality during the late (weeks 3–6) but not the early phase (weeks 1–2) of treatment. A separate IPD MA confirmed a lack of harms related to suicidal ideation in general and older adult populations. Fluoxetine and venlafaxine decreased suicidal thoughts and behavior for adult and geriatric patients. They determined that the protective effect was mediated by decreases in depressive symptoms with treatment.240

Other Serious Adverse Events

ESRs also reported on specific serious adverse events, although the evidence was limited and the data were primarily from observational studies. For falls and fractures, the available evidence was insufficient to determine whether pharmacotherapy increased the risk of serious harm (Figure 19, Appendix E Table 36).246, 252 Most analyses included only one to three RCTs and few events. The largest analysis was among observational studies and found an increased risk of fracture with antidepressant use (RR, 1.67 [95% CI, 1.56 to 1.79], 23 studies, N not reported, I2=88.4). Effect sizes were very similar in stratified analyses of studies that did and did not control for depression. These observational studies include a risk of confounding by unmeasured variables, such as indication for treatment.

This figure shows a forest plot of group differences in falls or fractures with pharmacological treatment of depression compared to placebo (key question 5).

Figure 19

Forest Plot of Group Differences in Falls or Fractures With Pharmacological Treatment of Depression Compared to Placebo (KQ5). Abbreviations: CI = confidence interval; ES = effect size; HR = hazard ratio; MDD = major depressive disorder; NR = not reported; (more...)

For cardio- or cerebro-vascular disease, four ESRs provided data, which was primarily or entirely limited to observational studies (Figure 20, Appendix E Table 37).244, 248, 249, 253 While many of the findings for stroke, intracranial hemorrhage, and venous thromboembolism showed an increased risk with antidepressant use, all reviews had a risk of confounding by indication, rendering these data insufficient to determine whether pharmacotherapy increased the risk of these serious harms. Findings were also inconclusive for mortality, dementia, and bleeding risk due to the small numbers of studies and events and most evidence being from observational studies (Appendix E Tables 38 and 39).

This figure shows a forest plot of group differences in cardiovascular-related outcomes with pharmacological treatment of depression compared to placebo (key question 5).*

Figure 20

Forest Plot of Group Differences in Cardiovascular-Related Outcomes With Pharmacological Treatment of Depression Compared to Placebo (KQ5). Abbreviations: AD = antidepressant; CI = confidence interval; CVD = cardiovascular disease; ES = effect size; HR (more...)

Similarly, evidence related to harms of antidepressants during pregnancy were almost entirely limited to observational evidence. An IPD meta-analysis of cohort studies found a statistically significant association between SSRI use and higher probability of preterm birth among women with depressive symptoms (OR, 1.6 [95% CI, 1.0 to 2.5]; 140/1328 (10.5) with SSRI use, 468/5652 (8.2) without SSRI, adjusted for race/ethnicity, parity, and smoking during pregnancy), but no association between either any antidepressant use or SSRI use and low birth weight, small for gestational age, or low 5-minute Apgar result (Appendix E Table 39). A review of 9 observational studies (n=1,287,539) examining the association between SSRI use and preeclampsia or gestational hypertension found an increased risk (OR, 1.43 [95% CI, 1.15 to 1.78]).255 They cautioned, however, that this evidence was limited by confounding and high heterogeneity, and most studies did not account for risk factors shared between mood disorders and hypertension or for underlying risk factors shared by depression and preeclampsia. Similarly, another broader review concluded that, although many studies report on adverse events, they could not rule out underlying disease severity as the cause of the association between exposures and adverse events.232 The authors of this review judged the certainty of evidence to draw conclusions to be insufficient or low in all instances, including congenital and cardiac anomalies (graded insufficient), primarily because of lack of control for confounding.

Anxiety

KQ1. Do Anxiety Screening Programs in Primary Care or Comparable Settings Result in Improved Health Outcomes in Adults, Including Pregnant and Postpartum Persons?

KQ1a. Does Sending Anxiety Screening Test Results to Providers (With or Without Additional Care Management Supports) Result in Improved Health Outcomes?

Summary

We identified two RCTs (reported in 4 publications) of anxiety screening, both in general adult populations.165, 258260 One of these also screened for depression and several other conditions along with anxiety165 (Table 23). Both trials found no reduction in anxiety symptoms or general psychological symptom severity compared with usual care at 13 to 22 weeks’ followup.

Table 23. Characteristics of Anxiety Screening Studies (KQ1).

Table 23

Characteristics of Anxiety Screening Studies (KQ1).

Study Characteristics

Two studies examined the benefits of screening for anxiety (N=918), both conducted in the US (Tables 2325, Appendix F Table 1).165, 259 A fair-quality study published in 1994 (n=618) screened adult primary care patients and enrolled those with elevated anxiety symptoms according to the Revised Symptom Checklist-90 (SCL-90-R) whose anxiety symptoms had not been recognized by their healthcare providers.259 Screening results for intervention participants were given to their primary care providers in the form of patient profiles showing anxiety symptoms and functional status. Primary care providers received one-on-one training in both the use of the study-provided profiles and anxiety treatment in general, and also had phone access to study physicians for questions. The average age of participants in this study was 42.6 years, 58.6 percent were women, and 80.4 percent were White. The race and ethnicity of the remaining participants was not reported. The second study (n=300, rated good quality) published in 2018 screened adult primary care patients for symptoms of anxiety, depression, sleep, pain, or fatigue and enrolled those who scored 4 or higher (out of 10) for any of these concerns.165 Primary care clinicians were given a visual display of participants’ symptom profile based on sections of the Patient-reported Outcome Measurement Information System (PROMIS). In this study, the average age of participants was 49.4 years, 71.7 percent were women, 49.3 percent were Black, and 45.0 percent were White. This study was also included above, under depression screening since it screened for both of these conditions.

Table 24. Participant Characteristics of Anxiety Screening Studies (KQ1).

Table 24

Participant Characteristics of Anxiety Screening Studies (KQ1).

Table 25. Intervention Characteristics of Anxiety Screening Studies (KQ1).

Table 25

Intervention Characteristics of Anxiety Screening Studies (KQ1).

Results

Both of the included studies reported that the screening programs did not improve anxiety outcomes over usual care (Table 26). The older study that only screened for anxiety found no differences between groups at followup in anxiety symptom levels or in any of the SF-36 subscale scores at 5 months’ followup.259 The study that screened for anxiety along with depression, pain, sleep disturbance, and fatigue reported a difference in improvement of 0.83 points on a 16-point scale at 3 months’ followup (p=.47).165 Similarly, this study also found almost identical absolute change in the General Severity Index (p=.74), a measure of mental health symptom severity. Across all outcomes reported, group differences in change ranged from −1.5 on a 16-point scale to 0.3 on a 40-point scale.

Table 26. Results of Anxiety Screening Studies (KQ1).

Table 26

Results of Anxiety Screening Studies (KQ1).

KQ2. Do Instruments to Screen for Anxiety Accurately Identify Adults, Including Pregnant and Postpartum Persons, With Anxiety in Primary Care or Comparable Settings?

Summary

We included ten primary studies (in 12 articles) that reported the test accuracy of screening for anxiety with the GAD, GAS, EPDS-anxiety subscale, or PHQ-panic disorder to detect generalized anxiety disorder, panic disorder, social anxiety disorder, or any anxiety disorder (Table 27, Figure 21).29, 261271 The most commonly studied instruments were the GAD-2 and GAD-7. To detect generalized anxiety disorder, the GAD-2 at a cutoff of ≥3 accurately identified 69 to 83 percent of adults (including pregnant women) with generalized anxiety disorder and 88 to 91 percent without it. The GAD-2 needed a lower cutoff to obtain similar test accuracy to detect any anxiety disorder, with a cutoff of ≥1 identifying a similar proportion of those with any anxiety disorder (70% to 90%), but at the cost identifying those without any anxiety disorder (55% to 64%). At a cutoff of ≥2, the GAD-2 accurately detected 50 to 91 percent of adults with a panic disorder and 63 to 74 percent of those without a panic disorder. At the same cutoff, the GAD-2 identified 85 percent of those with social anxiety disorder and 62 percent of those without. In general, the GAD-7 performed as well or better than the GAD-2.

This figure shows a summary of test accuracy of screening instruments to detect anxiety disorders (key question 2)

Figure 21

Summary of Test Accuracy of Screening Instruments to Detect Anxiety Disorders (KQ2). * Pooled results for fewer than 3 studies shown only for illustrative purposes. Note: The number of participants for Generalized Anxiety Disorder and Any Anxiety Disorder, (more...)

Table 27. Characteristics of Studies Examining the Test Accuracy of Instruments to Screen for Anxiety.

Table 27

Characteristics of Studies Examining the Test Accuracy of Instruments to Screen for Anxiety.

Study Characteristics

Ten primary studies (N=6,463) were included that provided test accuracy results for anxiety screening (Table 27).261267, 269271 Included studies primarily examined the GAD-2 and GAD-7; one study reported accuracy for the EPDS anxiety subscale, one study reported accuracy for the GAS, and one for the panic disorder module of the PHQ. Four studies were conducted in the US.263, 265, 269, 270 The others took place in South Korea, Finland, Australia, Canada, and the UK. Sample size ranged from 50 to 1,715; four of the studies analyzed a sample of 249 participants or less.

Two studies recruited older adults (65 years or older),263, 271 three studies recruited patients from prenatal care,262, 266, 267 one study recruited adults who were high utilizers of primary care,264 and the remaining four recruited adults from primary care or the community. Mean age ranged from 29 to 75 years (k=9) (Table 28). Women were represented in higher proportions than men: 57 to 100 percent of participants were women. Race and ethnicity were reported in six studies. One study, conducted in South Korea, recruited only participants of South Korean ethnicity.261 A US-based study recruited participants from an integrated community care clinic and reported that 76 percent of participants were Hispanic/Latino.265 One study—conducted among patients using inner-city maternity services in the UK—recruited 53 percent White and 32 percent Black participants. The remaining three studies reporting race or ethnicity recruited mainly White participants (79%, 80%, and 91%).263, 269, 270 SES was variably reported; mean years of education ranged from 14.6 to 17.3 (k=2) and those with 12 or more years of education ranged from 88 to 94 percent (k=5).

Table 28. Participant Characteristics for Studies of Test Accuracy of Anxiety Screening Instrument (KQ2).

Table 28

Participant Characteristics for Studies of Test Accuracy of Anxiety Screening Instrument (KQ2).

All studies used a structured or semi-structured interview within two weeks after the screener to identify generalized anxiety disorder or any anxiety disorder. The most common interviews were the MINI (k=4) and the SCID (k=4). The proportion of participants who were diagnosed with generalized anxiety disorder ranged from 1.8 percent to 16 percent, the proportion diagnosed with any anxiety disorder ranged from 3.1 percent to 32 percent, and the proportion diagnosed with panic disorder in two studies was 6.7 and 6.8 percent. The one study reporting social anxiety disorder reported a prevalence of 6.2 percent.

Results

GAD-2

Four studies reported the accuracy of the GAD-2 to detect GAD,261, 264, 267, 270 one of which took place in the US among primary care patients (Table 27).270 Despite the GAD-2 being developed to detect generalized anxiety disorder, some of these studies also reported test accuracy of the GAD-2 to detect any anxiety disorder (k=4), panic disorder (k=2), and social anxiety disorder (k=1).

Generalized Anxiety Disorder

Three studies among general adult populations reported the test accuracy of the GAD-2 to detect GAD.261, 264, 270 At a cutoff of ≥2, the pooled sensitivity to detect GAD was 0.94 (95% CI, 0.90 to 0.98; I2=0%) and the pooled specificity was 0.68 (95% CI, 0.64 to 0.72; I2=94.5%). At a cutoff of ≥3, the pooled sensitivity was 0.81 (95% CI, 0.73 to 0.89; I2=28.8%) and the pooled specificity was 0.86 (95% CI, 0.83 to 0.90; I2=84.5%) (Figure 22, Appendix F Table 2).261, 264, 270

This figure shows the test accuracy of the GAD-2 to detect generalized anxiety disorder, by cutoff (key question 2)

Figure 22

Test Accuracy of the GAD-2 to Detect Generalized Anxiety Disorder, by Cutoff (KQ2). Note: Pooled results for the three general adult studies are not shown. At a cutoff of ≥2, pooled sensitivity was 0.94 (95% CI, 0.90 to 0.98; I2=0%) and pooled (more...)

For the study among pregnant women (n=9,750), at a cutoff of ≥1, the sensitivity of the GAD-2 to identity GAD was 1.0 (95% CI, 0.99 to 1.0) and the specificity was 0.60 (95% CI, 0.60 to 0.61).267 At a cutoff of ≥3, the sensitivity to detect GAD was 0.69 (95% CI, 0.64 to 0.73) and the specificity was 0.91 (95% CI, 0.90 to 0.91) (Figure 22, Appendix F Table 2).267

Any Anxiety Disorder

The same three studies among adults reported the test accuracy of the GAD-2 to detect any anxiety disorder.261, 264, 270 At a cutoff of ≥2, the pooled sensitivity to detect any anxiety disorder was 0.76 (95% CI, 0.65 to 0.87; I2=85.8%) and the pooled specificity was 0.73 (95% CI, 0.69 to 0.76; I2=67.7%). At a cutoff of ≥3, the pooled sensitivity was 0.53 (95% CI, 0.39 to 0.66; I2=86.8%) and the pooled specificity was 0.90 (95% CI, 0.88 to 0.92; I2=48.1%) (Figure 23, Appendix F Table 3).261, 264, 270

This figure shows the test Accuracy of the GAD-2 to detect any anxiety disorder, by cutoff (key question 2)

Figure 23

Test Accuracy of the GAD-2 to Detect Any Anxiety Disorder, by Cutoff (KQ2). Note: Pooled results for the three general adult studies are not shown. At a cutoff of ≥2, pooled sensitivity was 0.76 (95% CI, 0.65 to 0.87; I2=85.8%) and pooled specificity (more...)

For two studies among pregnant patients (n=528 [9,750 extrapolated] and n=954), at a cutoff of ≥1, the sensitivity of the GAD-2 to identity any anxiety disorder was 0.90 (95% CI, 0.74 to 0.97) and 0.70 (0.68, 0.73) and the specificity was 0.63 (95% CI, 0.59 to 0.66) and 0.64 (95% CI, 0.63 to 0.65).262, 267 At a cutoff of ≥3, the sensitivity to detect any anxiety disorder was 0.30 (95% CI, 0.17 to 0.48) and 0.26 (95% CI, 0.24 to 0.29) and the specificity was 0.98 (95% CI, 0.96 to 0.98) and 0.91 (95% CI, 0.90 to 0.92)262, 267 (Figure 23, Appendix F Table 3).

Panic Disorder

Two studies reported the test accuracy of the GAD-2 to identify panic disorder among adults.264, 270 At a cutoff of ≥2, sensitivity ranged from 0.50 (95% CI, 0.19 to 0.81) among high utilizers of primary care to 0.91 (95% CI, 0.81 to 0.97) among primary care patients in the US. Specificity ranged from 0.74 (95% CI, 0.66 to 0.81) to 0.63 (95% CI, 0.60 to 0.66), respectively. At a cutoff of ≥3, sensitivity decreased (0.30 to 0.76) but specificity increased (0.81 to 0.89) (Figure 24, Appendix F Table 4).

This figure shows the test accuracy of the GAD-2 to detect panic disorder, by cutoff (key question 2)

Figure 24

Test Accuracy of the GAD-2 to Detect Panic Disorder, by Cutoff (KQ2). Abbreviations: CI = confidence interval; GAD = generalized anxiety disorder.

Social Anxiety Disorder

One study among primary care patients in the US reported the test accuracy of the GAD-2 to detect social anxiety disorder.270 At a cutoff of >=2, the sensitivity was 0.85 (95% CI, 0.73 to 0.93) and the specificity was 0.62 (95% CI, 0.59 to 0.65). At a cutoff of >=3, the sensitivity was lowered to 0.70 (95% CI, 0.57 to 0.81) and the specificity increased to 0.81 (95% CI, 0.78 to 0.83) (Appendix F Table 5).270

GAD-7

Six studies reported test accuracy for the GAD-7 to detect GAD, PD, SAD, or any anxiety disorder (Table 27).261, 262, 264, 265, 270, 271 Four of the studies recruited adults from the community or primary care,261, 264, 265, 270 although one was among high utilizers of primary care.264 One study recruited community-dwelling older adults attending primary care271 and one recruited prenatal patients.262

Generalized Anxiety Disorder

To detect GAD, three studies reported test accuracy for the GAD-7 at a cutoff of ≥8, ≥9, and ≥10.261, 264, 270 At a cutoff of ≥10, the pooled sensitivity to detect GAD was 0.79 (95% CI, 0.65 to 0.94; I2=77.3%) and pooled specificity was 0.89 (95% CI, 0.83 to 0.94; I2=94.8%). Sensitivity among the three studies ranged from 0.67 to 0.89, and specificity ranged from 0.82 to 0.95. At lower cutoffs (≥8, ≥9), sensitivity increased and specificity decreased (Figure 25, Appendix F Table 2). At higher (≥10–21) cutoffs, only one to two studies reported test accuracy data at each cutoff to detect GAD. These studies followed the same trend with higher cutoffs yielding lower sensitivity and higher specificity and lower cutoffs yielding higher sensitivity and lower specificity.

This figure shows the test accuracy of the GAD-7 to detect generalized anxiety disorder, by cutoff (key question 2)

Figure 25

Test Accuracy of the GAD-7 to Detect Generalized Anxiety Disorder, by Cutoff (KQ2). Abbreviations: CI = Confidence interval; GAD = Generalized Anxiety Disorder; n = number of participants.

Any Anxiety Disorder

To adequately detect any anxiety disorder, lower cutoffs of the GAD-7 were necessary. At a cutoff of ≥6, pooled sensitivity of the GAD-7 to detect any anxiety disorder from four studies conducted among adults was 0.67 (95% CI, 0.48 to 0.81; I2=90.5%; n=2,322) and pooled specificity was 0.81 (95% CI, 0.73 to 0.87; I2=91.0%) (pooled estimate not shown in a figure).261, 264, 265, 270 Sensitivity ranged from 0.38 to 0.85 and specificity ranged from 0.71 to 0.91 (Figure 26, Appendix F Table 3). At a cutoff of ≥5, the pooled sensitivity to detect any anxiety disorder among adults was 0.81 (95% CI, 0.68 to 0.95; I2=91.4%) and the pooled specificity was 0.72 (95% CI, 0.63 to 0.81; I2=96.1%) (pooled estimate not shown in a figure). At lower cutoffs, sensitivity increased and specificity decreased, but no more than two studies among a general adult population were represented at each lower cutoff. Similarly, at higher (≥10–21) cutoffs, only one to two studies reported test accuracy data at each cutoff to detect any anxiety disorder. These studies followed the same trend with higher cutoffs yielding lower sensitivity and higher specificity and lower cutoffs yielding higher sensitivity and lower specificity.

This figure shows the test accuracy of the GAD-7 to detect any anxiety disorder, by cutoff (key question 2)

Figure 26

Test Accuracy of the GAD-7 to Detect Any Anxiety Disorder, by Cutoff (KQ2). Note: Pooled results for the four general adult studies are not shown. At a cutoff of ≥6, pooled sensitivity was 0.67 (95% CI, 0.48 to 0.81; I2=90.5%) and pooled specificity (more...)

The one study that examined the test accuracy of the GAD-7 to detect any anxiety disorder among older adults determined the optimal cutoff was ≥5.271 Sensitivity was 0.71 (95% CI, 0.65 to 0.76) and specificity was 0.57 (95% CI, 0.54 to 0.59) (Figure 26), with an AUC of 0.695 (Appendix F Table 3). While lower cutoffs yielded higher sensitivities (ranging from 0.80 to 0.92), the corresponding specificity was lowered to unacceptable levels (ranging from 0.25 to 0.46).271 Similarly, higher cutoffs lowered sensitivity and increased specificity (Appendix F Table 2).

For the one study that recruited pregnant women, to detect any anxiety disorder, four cutoffs of the GAD-7 were reported ranging from ≥4 to ≥7.262 Sensitivity ranged from a low of 0.43 (95% CI, 0.27 to 0.61) at a cutoff of ≥7 to a high of 0.80 (95% CI, 0.63 to 0.90) and a cutoff of ≥4. Corresponding specificity was 0.93 (95% CI, 0.91 to 0.94) and 0.71 (95% CI, 0.68 to 0.73), respectively.262

Panic Disorder

Two studies among adults—one among primary care patients in the US—reported the test accuracy of the GAD-7 to detect panic disorder.264, 270 At a cutoff of ≥6 (the cutoff required to adequately detect any anxiety disorder), sensitivity to detect panic disorder ranged from 0.70 (95% CI, 0.35 to 0.93) among high utilizers of primary care to 0.88 (95% CI, 0.78 to 0.95) among primary care patients in the US and specificity ranged from 0.64 (0.60 to 0.67) to 0.79 (95% CI, 0.72 to 0.86). At a cutoff of ≥10 (the cutoff needed to detect generalized anxiety disorder), sensitivity among high utilizers of primary care was only 0.40 (95% CI, 0.12 to 0.74) and the specificity was 0.95 (95% CI, 0.90 to 0.98). Among primary care patients in the US, a cutoff of 10 yielded a sensitivity of 0.74 (95% CI, 0.62 to 0.84) and specificity of 0.81 (95% CI, 0.78, 0.83). Both studies showed an inverse relationship between sensitivity and specificity—where lower cutoffs increased sensitivity and decreased specificity—as the cutoff was adjusted (Figure 27, Appendix F Table 4).264, 270

This figure shows the test accuracy of the GAD-7 to detect panic disorder, by cutoff (key question 2)

Figure 27

Test Accuracy of the GAD-7 to Detect Panic Disorder, by Cutoff (KQ2). Abbreviations: CI = confidence interval; GAD = Generalized Anxiety Disorder.

Social Anxiety Disorder

One study among primary care patients (n=965) in the US reported the test accuracy of the GAD-7 to detect social anxiety disorder.270 Reported cutoffs ranged from ≥5 to ≥10. Sensitivity to detect social anxiety disorder ranged from 0.72 (95% CI, 0.59 to 0.83) at a cutoff of ≥10 to 0.88 (95% CI, 0.77 to 0.95) at a cutoff of ≥5. Specificity ranged from 0.55 (95% CI, 0.52 to 0.59) at a cutoff of ≥5 to 0.80 (95% CI, 0.77 to 0.83) at a cutoff of ≥10 (Appendix F Table 5).

Other Anxiety Screeners

One study reported test accuracy of the geriatric anxiety scale (GAS) to identify any anxiety disorder among 110 older adults in the US.263 The study reported cutoffs ranging from >9 to >16 with a cutoff of >9 identified as yielding the optimal balance of sensitivity and specificity. At a cutoff of >9, sensitivity of the GAS to detect any anxiety disorder was 0.60 (95% CI, 0.31 to 0.83) and specificity was 0.75 (95% CI, 0.66 to 0.82). Sensitivity increased and specificity decreased with increasing cutoffs (Appendix F Table 3).263

Two studies262, 266 reported the accuracy of the EPDS anxiety subscale to identify any anxiety disorder among prenatal patients; one reported the sensitivity at a single cutoff only. At a cutoff of 5, sensitivity of the EPDS anxiety subscale to detect any anxiety disorder ranged from 0.54 (95% CI, 0.38 to 0.70)266 to 0.70 (95% CI, 0.52 to 0.83)262. Corresponding specificity for the single study that reported it was 0.84 (95% CI, 0.81 to 0.86).262 At a lower cutoff of 4, sensitivity improved slightly (0.73 [95% CI, 0.56 to 0.86]) but specificity was much lower (0.71 [95% CI, 0.68 to 0.74])262 (Appendix F Table 3).

One study reported the test accuracy of the panic disorder module of the PHQ to detect panic disorder among US adults in primary care.269 If all five items of the PHQ-PD were endorsed, the sensitivity of the PHQ-PD to detect panic disorder was 0.81 (95% CI, 0.69 to 0.93) and the specificity was 0.99 (95% CI, 0.98 to 1.0) (Appendix F Table 4).269

KQ3. What Are the Harms Associated With Screening for Anxiety in Primary Care or Comparable Settings in Adults, Including Pregnant and Postpartum Persons?

Neither of the two studies of anxiety screening reported on harms, and there was no pattern of effects indicating that screening might paradoxically increase anxiety or mental health symptoms.165, 259

KQ4. Does Treatment of Anxiety (Psychotherapy or Pharmacotherapy) Result in Improved Health Outcomes in Adults, Including Pregnant and Postpartum Persons?

Summary

We included 26 RCTs (reported in 36 publications) among primary care patients272308 and 18 ESRs (not limited to primary care populations) addressing treatment for anxiety (Tables 2932).211, 215, 220, 232, 309322 Among the 24 included RCTs of psychological interventions, 14 were in mixed populations of people with anxiety or depression, and tenwere limited to people with anxiety. Psychological interventions showed a relatively small but statistically significant reduction in anxiety symptom severity in primary care patients with anxiety (SMD, −0.41 [95% CI, −0.58 to −0.23]; 10 RCTs [n=2,075]; I2=40.2% , Table 33, Figure 28), but not among mixed populations of people with anxiety or depression (SMD, −0.18 [95% CI, −0.39 to 0.03]; 12 RCTs [n=1,868]; I2=66.7%). In the ESRs of psychological treatment, which included an estimated 144 RCTs and approximately 11,000 participants, treatment was associated with reduced anxiety symptoms; SMDs at post-treatment among broad adult populations were −0.80 and larger (e.g., among people with generalized anxiety disorder, SMD, −0.80 [95% CI, −0.93 to −0.67]; 31 RCTs, N and I2 not reported; Figure 29). Psychological treatment was also associated with improved depression symptom severity and quality of life. More limited evidence suggested a benefit in older and perinatal patients as well.

This figure shows a forest plot showing the difference between groups in change from baseline in anxiety symptoms, for primary studies of psychological intervention for treatment of anxiety in primary care populations reported in primary RCTs (key question 4)

Figure 28

Forest Plot Showing the Difference Between Groups in Change From Baseline in Anxiety Symptoms, for Primary Studies of Psychological Intervention for Treatment of Anxiety in Primary Care Populations Reported in Primary RCTs (KQ4). Abbreviations: BAI = (more...)

This figure shows a forest plot of standardized mean differences between groups in anxiety symptom severity for psychological treatment of anxiety compared to controls reported in ESRs (key question 4).

Figure 29

Forest Plot of Standardized Mean Differences Between Groups in Anxiety Symptom Severity for Psychological Treatment of Anxiety Compared to Controls Reported in ESRs (KQ4). Abbreviations: CBT = cognitive behavioral therapy; CI = confidence interval; FUP (more...)

Table 29. Characteristics of Primary Research Studies of Psychological Treatment of Anxiety in Primary Care Patients (KQ4).

Table 29

Characteristics of Primary Research Studies of Psychological Treatment of Anxiety in Primary Care Patients (KQ4).

Table 30. Characteristics of RCTs of Pharmacologic Treatment of Anxiety in Primary Care Patients (KQs 4, 5).

Table 30

Characteristics of RCTs of Pharmacologic Treatment of Anxiety in Primary Care Patients (KQs 4, 5).

Table 31. Characteristics of ESRs of Psychological Treatment of Anxiety (KQ4).

Table 31

Characteristics of ESRs of Psychological Treatment of Anxiety (KQ4).

Table 32. Characteristics of ESRs of Pharmacologic Treatment of Anxiety (KQs 4, 5).

Table 32

Characteristics of ESRs of Pharmacologic Treatment of Anxiety (KQs 4, 5).

Table 33. Summary of Meta-Analysis Results for Anxiety Outcomes in Primary Research Studies of Psychological Treatment of Anxiety in Primary Care Patients (KQ4).

Table 33

Summary of Meta-Analysis Results for Anxiety Outcomes in Primary Research Studies of Psychological Treatment of Anxiety in Primary Care Patients (KQ4).

There were only two RCTs of pharmacotherapy in primary care patients, addressing venlafaxine and escitalopram, and both showed a benefit. Broad ESRs (i.e., not limited to primary care patients) reported improved anxiety and other outcomes for people taking antidepressants and benzodiazepines compared to placebo. For example, among patients with generalized anxiety disorder, the SMD for change in anxiety symptom severity with SSRIs was −0.66 (95% CI, −0.90 to −0.43, 31 studies, N and I2 not reported). For antidepressants, benefits were seen for a variety of anxiety outcomes among people with generalized anxiety disorder, social anxiety disorder, and panic disorder. Limited evidence suggested that antidepressants and benzodiazepines may improve anxiety symptoms in older adults, but evidence in perinatal patients was lacking. Improvements were also seen for depression and social functioning outcomes with pharmacotherapy.

Psychological Treatment of Anxiety

Primary Study Characteristics

We included 24 RCTs (N=5,307) that examined the benefits of psychological interventions to treat anxiety (Table 29),274, 275, 277, 278, 280, 283286, 289, 290, 292299, 302, 304307 including ten trials in which all participants had anxiety disorders or symptoms278, 289, 290, 293, 294, 297, 298, 304, 305, 307 and 14 studies of participants with either anxiety or depression (i.e., some participants may not have had anxiety).274, 275, 277, 280, 283286, 292, 295, 296, 299, 302, 307 All interventions were either specifically targeted at anxiety, or used flexible treatment approaches that are appropriate for anxiety (e.g., cognitive behavioral techniques, mindfulness, problem solving approaches). Most studies (k=16) were conducted in populations of general adults.274, 275, 277, 278, 280, 283, 284, 286, 290, 292296, 299, 304, 305 The remaining studies were conducted in populations of older adults285, 297, 298, 302 or perinatal populations.274, 306, 307

Seven of the trials were conducted in the US,280, 286, 293, 294, 297, 298, 304 and the remaining were conducted in the UK,277, 283, 284, 292, 305, 306 the Netherlands,274, 295, 296 Canada,302, 307 Sweden,290, 299 Germany,278, 289 Hong Kong,285 and Spain275. Most trials (k=18) recruited participants from primary care clinics or other primary care relevant settings; however, two trials recruited from other clinical settings (e.g., multispecialty medical organization, university health center),298, 302 and two trials recruited from OB-GYN and midwifery practices.274, 306 Thirteen of the trials used screening to identify eligible participants, either entirely274, 275, 278, 285, 295, 296, 304, 305, 307 or for a subset of participants.286, 292, 294, 307 Only four of the trials limited to people with anxiety used screening for participant recruitment.278, 294, 305, 306

Seven trials were rated as good quality,280, 290, 293, 294, 298, 304, 306 and the remaining were rated as fair quality. Common reasons for downgrading included baseline differences between treatment groups that were not statistically controlled for in analyses, excessive or differential loss to followup between groups, or inadequate methods for handling missing data.

Sociodemographic information about the included RCTs is presented in Appendix F Table 6 and summarized in Table 34. Across all studies, the mean age was 45.4 years, and 74.5 percent of participants were women. Among the six trials conducted in the US and reporting on race and ethnicity,280, 286, 293, 294, 297, 298 the majority (68.5 percent) of participants were White, 16.3 percent were Hispanic/Latino, 15.3 percent were Black, 1.5 percent were Asian American or Pacific Islander and less than one percent were Native American or Alaska Native. In studies that reported race and ethnicity data, the percentage of White participants ranged from 56.6 to 81.8 percent. None of the studies appeared to target sub-populations with significant socioeconomic challenges (e.g., low income or homelessness).

Table 34. Summary of Participant Demographic Characteristics in Primary Studies of Psychological Treatment of Anxiety in Primary Care Patients (KQ4); Weighted Mean (Number of Studies Reporting), Unless Otherwise Indicated.

Table 34

Summary of Participant Demographic Characteristics in Primary Studies of Psychological Treatment of Anxiety in Primary Care Patients (KQ4); Weighted Mean (Number of Studies Reporting), Unless Otherwise Indicated.

Intervention characteristics of the RCTs are summarized in Table 35 and detailed in Appendix F Table 7. The most commonly utilized intervention approach was cognitive behavior therapy (CBT), with or without a support group, which was used in eighteen studies.274, 275, 277, 278, 280, 284, 289, 290, 292295, 297, 298, 304307 Common components of CBT-based interventions included psychoeducation, goal-setting, cognitive restructuring, behavioral activation, self-monitoring, and problem solving. Few studies involved primary care providers in the delivery of the intervention. However, one study intervention (Coordinated Anxiety Learning and Management, or CALM) allowed participants to choose CBT, medication, or both and was delivered by nonexpert care managers who also assisted primary care clinicians in promoting medication adherence.294 Another CBT intervention had the primary care provider delivering most or all of the intervention content, which included four individual sessions delivered in person, along with printed companion materials.278 The most intensive CBT intervention involved up to 14 weekly 90-minute in-person manualized CBT sessions followed by 3 monthly booster sessions.305 The least intensive CBT intervention was a 22-session app-based intervention totaling 50 minutes of therapist phone contact over an 8-week period. The intervention was delivered via a combination of web, email, text, and phone contacts.280 Less commonly utilized intervention approaches included problem-solving therapy (alone or with case management),283, 285, 286, 296 mindfulness-based approaches,299, 302 or non-directive therapy.284 Most studies used usual care as the control condition, however some studies utilized waitlist, attention, or minimal treatment controls.

Table 35. Summary of Intervention Components in Primary Research Studies of Psychological Treatment of Anxiety in Primary Care Patients (KQ4).

Table 35

Summary of Intervention Components in Primary Research Studies of Psychological Treatment of Anxiety in Primary Care Patients (KQ4).

ESR Characteristics

In addition to trial evidence, we included eight ESRs that addressed psychological treatment of anxiety (Table 31, Appendix F Table 8).211, 215, 220, 313315, 317, 321 We focused on results reflecting the impact on health outcomes in general populations or in a priori populations of interest, with minimal examination of effect modification by study or intervention characteristics. Four of the reviews include studies in general adult populations,313, 314, 317, 321 while the other reviews limited their focus to older adults,315 general perinatal population,323 Black and Hispanic/Latino perinatal population,211 and rural populations.215 All reviews included studies that addressed generalized anxiety disorder, panic disorder, and social anxiety disorder. Some reviews covered additional anxiety disorders as well, but we did not include results that were specific to disorders outside of our scope (e.g., OCD, PTSD). The largest review included 144 studies, of which 90 were specifically targeted at generalized anxiety disorder, panic disorder, and social anxiety disorder.313

Results (Primary and ESR Evidence)

Detailed results for all outcomes are reported in Appendix F.

Anxiety Outcomes

Twenty-two of the RCTs among primary care patients reported on anxiety symptoms and could be included in the meta-analysis, ranging from 8 to 30 weeks’ followup.274, 275, 277, 278, 280, 285, 286, 289, 290, 292299, 302, 304307 The overall pooled effect size for all twenty-two studies was statistically significant, in favor of the intervention groups (SMD, −0.29 [95% CI, −0.44 to −0.15]; 22 RCTs [n=3,943]; I2=70.6%, Figure 28, Table 33). However, the pooled effect size for the twelve studies that included participants with or without anxiety was not statistically significant (SMD, −0.18 [95% CI, −0.39 to 0.03]; 12 RCTs [n=1,868]; I2=66.7%), whereas the pooled effect size for the ten studies which required participants to have anxiety was statistically significant (SMD, −0.41 [95% CI, −0.58 to −0.23]; 10 RCTs [n=2,075]; I2=40.2%). One RCT also reported on disorder-specific outcome measures for subgroups with specific anxiety disorder diagnoses.294 In all cases, disorder-specific outcomes showed statistically significant improvement at 6 and 12-month followup (Appendix F Table 9).

One of the included RCTs offered primary care patients with panic disorder, social anxiety disorder, generalized anxiety disorder, or PTSD the choice between medication, CBT, or both in comparison to usual care.294 First choice medications included SSRIs or SNRIs but could be augmented by another antidepressant or a benzodiazepine for non-refractory patients. While the intervention participants demonstrated greater improvements on a number of outcomes, the study did not report results separately for participants who chose medication (with or without CBT) as part of their treatment. Therefore, this study was unable to determine which specific components of the blended intervention contributed to the results.294

Other less commonly reported anxiety related outcomes in the RCTs included anxiety response278, 297, 298 and anxiety remission,278, 280 variously defined. Both studies addressing remission found greater likelihood of remission for at least one outcome among those in the intervention group, but most findings for treatment response did not demonstrate a benefit (Appendix F Table 10).278, 280

Among the ESRs, most effect sizes at the end of treatment were in the moderate to large range. For example, the broadest analyses showed clear benefits of CBT at the post-treatment assessment for generalized anxiety disorder (SMD, −0.80 [95% CI, −0.93 to −0.67]; 31 studies), social anxiety disorder (SMD, −0.88 [95% CI, −1.03 to −0.74]; 48 studies),313 and panic disorder (SMD, −0.81 [95% CI, −1.04 to −0.59]; 42 studies, N and I2 not reported, Figure 29, Appendix F Table 11). Similar benefits were seen for perinatal women. Effect sizes tended to be smaller and based on fewer studies at followup beyond the post-treatment assessment. For older adult evidence was more sparse, effect sizes had wide confidence intervals, and were frequently not statistically significant, although SMDs were all −0.20 or larger, in the direction of benefit.

Other Mental Health Outcomes, Quality of Life, and Functioning

Twenty-two of the RCTs among primary care patients reported on depression symptoms ranging from 8 to 30 weeks’ followup.274, 275, 277, 278, 280, 284286, 290, 292299, 302, 304307 The overall pooled effect size for all nineteen studies was statistically significant (SMD, −0.32 [95% CI, −0.46 to −0.19]; 22 RCTs [n=3,970]; I2=66.4%, Figure 30, Table 33, Appendix F Table 12), in favor of the intervention groups. The pooled effect was statistically significant both in the studies limited to people with anxiety (SMD, −0.49 [95% CI, −0.74 to −0.25]; 9 RCTs [n=1,990]; I2=68.4%) and in mixed populations with anxiety or depression (SMD, −0.20 [95% CI, −0.34 to −0.06]; 13 RCTs [n=1,980]; I2=39.9%; p=0.01 for the difference in effect size between studies requiring anxiety vs. those in mixed populations).

This figure shows a forest plot showing the difference between groups in change from baseline in depression symptoms, for primary studies of psychological intervention for treatment of anxiety in primary care populations (key question 4).

Figure 30

Forest Plot Showing the Difference Between Groups in Change From Baseline in Depression Symptoms, for Primary Studies of Psychological Intervention for Treatment of Anxiety in Primary Care Populations (KQ4). Abbreviations: BDI = Beck Depression Inventory; (more...)

Only one RCT among primary care patients reported depression remission outcomes; that trial included people with anxiety or depression. Graham and colleagues (2020) defined treatment remission as PHQ-9 scores less than 5 or a 50 percent reduction from baseline.280 The rate of recovery from depression was 59.4 percent in the app-based CBT intervention group and 31.0 percent in the wait list control group. The odds of recovery for depression were 3.25 (95% CI, 1.54 to 6.86) times greater for intervention participants compared with the control group.280

Ten RCTs among primary care patients reported on quality-of-life outcomes ranging from 8 to 30 weeks’ followup.285, 286, 290, 293295, 297, 298, 302, 306 Few individual study findings were statistically significant, and the pooled effect sizes were small and not statistically significant for both the Mental Health Component scale of the SF-12 or SF-36 (SMD, 0.17 [95% CI, −0.03 to 0.38]; 7 RCTs [n=2,104]; I2=54.4%) and the Physical Component Scale (SMD, 0.03 [95% CI, −0.12 to 0.18]; 5 RCTs [n=1,656]; I2=54.4; Figure 31). Other health outcomes reported included global mental health symptoms,277, 283, 284, 290 general functioning,283, 284, 292, 294 infant outcomes (e.g., birth weight, gestational age, and Apgar scores),274 and emergency room visits and hospitalizations,293 and parenting adjustment.306, 307 Very few individual findings for any of these outcomes showed statistically significant group differences (Appendix F Tables 12–15).

This figure is a forest plot showing the difference between groups in change from baseline in quality of life measures, for primary studies of psychological intervention for treatment of anxiety in primary care populations.

Figure 31

Forest Plot Showing the Difference Between Groups in Change From Baseline in Quality of Life Measures, for Primary Studies of Psychological Intervention for Treatment of Anxiety in Primary Care Populations (KQ4). Abbreviations: CBT = cognitive behavioral (more...)

Among the included ESRs (which were not limited to primary care patients), one reported improvement in quality of life with CBT treatment for anxiety (SMD, −0.56 [95% CI, −0.80 to −0.32, 21 RCTs, N and I2 not reported, Figure 32, Appendix F Table 16).317 Another review found that depression symptoms were improved with CBT among people with generalized anxiety disorder, panic disorder, and social anxiety disorder; these findings held up even when limited to studies rated as having a low risk of bias (Figure 32, Appendix F Table 16).314

This figure is a forest plot of standardized mean differences between groups in other outcomes for psychological treatment of anxiety compared to controls.

Figure 32

Forest Plot of Standardized Mean Differences Between Groups in Other Outcomes for Psychological Treatment of Anxiety Compared to Controls (KQ4). Abbreviations: CBT = cognitive behavioral therapy; GAD = generalized anxiety disorder; QoL = Quality of Life; (more...)

Effect Modification and Findings in Specific Populations

One of the primary RCTs by Rollman and colleagues (2018) reported subgroup analyses by age, gender, race (White vs. other race and ethnic groups), level of education, baseline GAD-7 and PHQ scores, and whether or not the participant lived alone.293 They reported better improvements in persons age 35–59 years relative to younger and older age groups on anxiety (p=.006), depression (p=.033), and global mental health (p=.01). Participants who were not White (88% of whom were Black) reported greater improvements in depression (p=.024) than White participants, and the effect was similar but not statistically significant for anxiety (p=.08). Persons who lived alone also showed greater improvements in depression (p=.008) and anxiety (p=.01). None of the other subgroup analyses resulted in statistically significant differences, although level of education approached significance (Appendix F Table 9).

We stratified forest plots of anxiety symptom severity from the primary RCTs among primary care patients by population (i.e., general adult, older adult, and perinatal), whether participants were recruited via screening, and several intervention characteristics (e.g., intervention type, modality, and total contact time) to determine other factors that may modify treatment effects. We combined all studies for these analyses, both those in which all participants had anxiety and those in mixed populations. None of these factors showed as strong an association with effect size as whether the population was limited to people with anxiety compared to mixed populations (Figure 33). However, given the limited number of studies and the many sources of variability, we have limited confidence in whether these analyses could clarify sources of effect modification. 278, 304

This figure is a forest plot of stratified analyses examining effect modification for anxiety symptom severity in primary studies of anxiety treatment among primary care patients.

Figure 33

Stratified Analyses Examining Effect Modification for Anxiety Symptom Severity in Primary Studies of Anxiety Treatment Among Primary Care Patients (KQ4). Abbreviations: CBT = cognitive behavioral therapy; CI = confidence interval; NA = not applicable; (more...)

Pharmacologic Treatment of Anxiety

Primary Study Characteristics

Two RCTs (N=423) among primary care patients examined the benefits of pharmacological interventions to treat anxiety (Tables 30 and 36).287, 288 Both studies were rated as good quality. Mean age across the two studies was 57.5 and 60.0 percent of the participants were women. Only one study reported race or ethnicity data and participants were 82.5 percent White.288 The first trial (N=244; UK) assessed the efficacy of venlafaxine XL (an SNRI) in participants with generalized anxiety disorder (with and without co-morbid depression) over a 24-week period.287 Participants were recruited from primary care settings, were over 18 years old, met DSM-IV criteria for generalized anxiety disorder, and had a score of 20 or more on the HAM-A and a score of 23 or less on the MADRS. Participants were randomized to receive 75 mg of venlafaxine or matched placebo. After 2 weeks, the dose could be doubled if initial response was poor. The second trial (N=179; US) assessed the efficacy of escitalopram (an SSRI) in older adults with generalized anxiety disorder over a 12-week period.288 Participants were recruited from primary care and specialty medical care (e.g., arthritis, geriatric medicine) clinics, were over 60 years old, and had a primary diagnosis of generalized anxiety disorder (defined as a score of 17 or more on the HAM-A). Participants were randomized to receive 10–20 mg of escitalopram or matched placebo.

Table 36. Participant Characteristics of Primary Research Studies of Pharmacologic Treatment of Anxiety in Primary Care Patients (KQ4).

Table 36

Participant Characteristics of Primary Research Studies of Pharmacologic Treatment of Anxiety in Primary Care Patients (KQ4).

ESR Characteristics

We also included ten ESRs of pharmacologic treatment of anxiety (Table 32, Appendix F Table 17), covering antidepressants, benzodiazepines, and buspirone.232, 309312, 316, 318320, 322 Two reviews focused on trials in older adults309, 316 and one focused on perinatal populations.232 Four of the reviews were not limited to a specific anxiety disorder,232, 309, 316, 319 two focused on generalized anxiety disorder,312, 320 three focused on panic disorder,310, 311, 318 and one focused on social anxiety disorder.322 We could not determine the total number of included studies across all included reviews, but estimate that at least 227 RCTs (N approximately 40,000) were included.

All but one319 of the included ESRs was rated good quality. The review rated as fair was downgraded because it lacked risk of bias assessment for included studies, however the focus of this review was on publication bias, and we felt risk of bias assessment was not central to this analysis. In this review, which addressed second generation antidepressants, the reviewers downloaded packets from the FDA website and submitted freedom of information requests for medications without packets. FDA information was compared with published studies to examine reporting bias, which was classified as study publication bias, outcome reporting bias, or spin. Four additional ESRs reported at least some efforts to include unpublished evidence.232, 311, 318, 322

Results

Detailed results for all outcomes are reported in Appendix F.

Primary Study Results

Anxiety and general mental health outcomes. In the trial of venlafaxine among primary care patients, participants taking venlafaxine showed greater improvement in the primary outcome of anxiety symptoms at 24 weeks followup, compared to placebo (mean difference at followup, −2.1 [95% CI, −4.2 to 0]; p = 0.05, Appendix F Table 18).287 Similar findings were observed for secondary outcomes of global mental health symptom score and the Mental Health subscale of the SF-36. Group differences were not statistically significant for treatment response, remission, or depression symptoms, although all of these trended in the direction of benefit for venlafaxine.287

In the RCT of escitalopram, which was limited to older adults, more participants taking escitalopram met the criteria for a treatment response than those taking a placebo (OR, 1.87 [95% CI, 1.03 to 3.39]; 60% taking escitalopram compared to 45% taking a placebo, p = 0.05, Appendix F Table 18).288 Treatment response was defined as a clinician rating of improved or very much improved. Participants taking escitalopram also showed greater reduction in global mental health symptoms and anxiety symptoms, but the finding for anxiety symptoms was not statistically significant (p=.06).288

ESR Results

Anxiety outcomes. The continuous outcome of anxiety symptom improvement was reported on for people with generalized anxiety disorder and panic disorder in general adult populations. (Figure 34, Appendix F Table 19). For generalized anxiety disorder, SMDs in anxiety symptoms scores ranged from −0.23 for serotonin modulators (95% CI, −0.53 to 0.06; 8 RCTs, N=1801; I2 not reported) to −1.84 for bupropion (95% CI, −3.05 to −0.62; 1 RCT, N=11; I2 not applicable).312 All but one of the seven effects were statistically significant with most in the medium to large effect size range. The effect for SSRIs was in the medium range, with the confidence intervals indicating a clearly statistically significant effect (SMD, −0.66 [95% CI, −0.90 to −0.43; 23 RCTs, N=2142; I2 not reported).312

This figure is a forest plot of standardized mean differences between groups in anxiety symptom severity for pharmacologic treatment of anxiety compared to controls.

Figure 34

Forest Plot of Standardized Mean Differences Between Groups in Anxiety Symptom Severity for Pharmacologic Treatment of Anxiety Compared to Controls (KQ4). Abbreviations: CI = confidence interval; GAD = generalized anxiety disorder; PD = panic disorder; (more...)

Improvements in anxiety symptoms were also reported in three reviews addressing panic disorder, with the use of antidepressants,310 buspirone,318 and benzodiazepines.311 Antidepressant use was associated with improved anxiety symptoms broadly, panic symptoms, number of panic attacks, and agoraphobia symptoms.310 SMDs ranged from −0.33 (95% CI, −0.47 to −0.20; 12 RCTs, N=2,477; I2, 57%) for mean change in anxiety symptoms broadly to −0.69 (95% CI, −0.99 to −0.39; 13 RCTs, N=2,987; I2, 91%) for endpoint agoraphobia scores. SSRIs showed a statistically significant benefit for all of these outcomes except for one agoraphobia outcome. TCAs showed a benefit for all but one agoraphobia and one broad anxiety symptom outcome.310 Benzodiazepines were associated with improvements in panic symptoms and agoraphobia (range of effects: SMD, −0.35 [95% CI, −0.50 to −0.20; 13 RCTs, N=2,371; I2, 58% to −0.92 [95% CI, −1.22 to −0.61; 7 RCTs, N=1,489, I2, 77%).311 However, buspirone had no impact on symptoms of agoraphobia in one small RCT (SMD, −0.01 [95% CI, −0.56 to 0.53; N=52).318

Two reviews reported on remission, for antidepressants310 and benzodiazepines, both limited to studies among people with panic disorder (Appendix F Table 20).311 Both types of medication demonstrated a benefit at followup of up to 28 weeks. Antidepressants demonstrated a benefit; they were associated with a 17 percent lower likelihood of failure to remit (RR, 0.83 [95% CI, 0.78 to 0.88]; 24 RCTs, N=6,164; I2=40%; 51% taking antidepressants vs. 60% taking placebo had not remitted at post-treatment).310 Benzodiazepines also demonstrated a benefit; they were associated with a 61 percent higher likelihood of remission (RR, 1.61 [95% CI, 1.38 to 1.88]; 15 RCTs, N=2,907; I2=62%; 63% taking benzodiazepines vs. 40% taking placebo were in remission at post-treatment).311 Remission was not reported for any other type of anxiety disorder.

Three reviews reported on response to treatment, for people with social anxiety disorder322 and panic disorder (Figure 35, Appendix F Table 21).310, 311 The largest body of evidence for social anxiety disorder was for SSRIs, which were associated with a 65 percent increase in the likelihood of treatment response. (RR, 1.65 [95% CI, 1.48 to 1.85]; 24 RCTs, N=4,984; I2=50%; 54% taking SSRIs vs. 32% taking placebo met study criteria for responding to treatment).322 For panic disorder, both antidepressants and benzodiazepines demonstrated an increased likelihood of response. Antidepressants were associated with a 28 percent reduced likelihood of failure to respond (RR, 0.72 [95% CI, 0.66 to 0.79]; 31 RCTs, N=6,500; I2=67%; 40% taking antidepressants, 56% taking placebo had not responded at post-treatment, not shown in the figure because it reported the inverse of all other reviews).310 Benzodiazepines were associated with a 65 percent increased likelihood of response (RR, 1.65 [95% CI, 1.39 to 1.96]; 16 RCTs, N=2,476; I2=67%; 65% taking benzodiazepines, 41% taking placebo were in remission at post-treatment).311 For benzodiazepines, effect sizes were of similar magnitude and statistically significant when studies were excluded from the analyses that (a) had attrition higher than 20 percent, (b) were limited to patients with comorbidities, (c) were industry-funded, and (d) were not industry funded.311

This figures is a forest plot of odds ratios for group differences in the odds of treatment response with pharmacological treatment of anxiety compared to placebo.

Figure 35

Forest Plot of Odds Ratios for Group Differences in the Odds of Treatment Response With Pharmacological Treatment of Anxiety Compared to Placebo (KQ4). Abbreviations: CI = confidence interval; GAD = generalized anxiety disorder; MDD = major depressive (more...)

Other outcomes. Reviews of RCTs among people with panic disorder and social anxiety disorder found improvements in other important outcomes (Figure 36, Appendix F Table 22). Reviews among people with panic disorder found statistically significant improvements in depression and social functioning with antidepressant310 and benzodiazepine311 use, but the effect was small and not statistically significant for quality of life with antidepressant use.310 For example, the standardized effect size for endpoint depression symptom score was −0.41 for antidepressants after 8 to 28 weeks (95% CI, −0.57 to −0.25; 12 RCTs, N=1,794; I2, 43%)310 and −0.70 for benzodiazepines after 3 to 15 weeks (95% CI, −1.08 to −0.32; 8 RCTs, N=968; I2, 78%).311 One RCT of buspirone did not demonstrate an impact on depression for people with panic disorder.318 For social anxiety disorder, SSRIs showed a benefit for depression, social functioning, family functioning, and work functioning, and benzodiazepines improved social and work functioning.322

This figure is a forest plot of groups in other outcomes for pharmacologic treatment of anxiety compared to placebo.

Figure 36

Forest Plot of Groups in Other Outcomes for Pharmacologic Treatment of Anxiety Compared to Placebo (KQ4). Abbreviations: CG = control group; CI = confidence interval; ES = effect size; HAMD = Hamilton Rating Scale for Depression; IG = intervention group; (more...)

Effect modification and findings in specific populations. In addition to effect modification findings described above for specific outcomes, one review examined publication and reporting bias for second generation antidepressants, addressing any anxiety disorder.319 Among the 57 trials identified, the FDA interpreted 41 of the 57 trials (72%) to have positive results. However, 43 of the 45 published article conclusions (96%) were positive (P < .001). Trials that the FDA determined to be positive were five times more likely to be published compared with trials that were not positive (risk ratio, 5.20; 95% CI, 1.87 to 14.45; P < .001). The reviewers found evidence for study publication bias (P < .001), outcome reporting bias (P = .02), and spin (P = .02). The pooled effect size based on the published literature (Hedges’ g, 0.38; 95% CI, 0.33 to 0.42; P < .001) was 15% higher than the effect size based on the FDA data (Hedges’ g, 0.33; 95% CI, 0.29 to 0.38; P < .001), but this difference was not statistically significant (β = 0.04; 95% CI, −0.02 to 0.10; P = .18); the effect size adjusted for publication bias was statistically significant (Appendix F Table 23).

Two narrative systematic reviews focused on trials of older adults, and found more limited evidence that antidepressants and benzodiazepines improved anxiety symptoms among older adults (Appendix F Table 23).309, 316 One review found seven placebo or waitlist-controlled RCTs, most limited to patients with generalized anxiety disorder, and reported that antidepressants were associated with reduced anxiety symptoms after 8 to 15 weeks of treatment.309 Similarly, in three of four placebo-controlled trials limited to older adults with generalized anxiety disorder, panic disorder, or any anxiety disorder, benzodiazepines were associated with decreased anxiety during the 4- to 8-week study period (p<.05).316 Another review that addressed pharmacologic treatment of mental health disorders in perinatal patients found no studies of pharmacologic treatment (benzodiazepines or other anxiolytics) for anxiety among perinatal patients (Appendix F Table 23).232

KQ5. What Are the Harms of Treatment of Anxiety (Psychotherapy or Pharmacotherapy) in Adults, Including Pregnant and Postpartum Persons?

Summary

None of the RCTs or ESRs of psychological treatment reported on adverse events, but there was no pattern of effects indicating an elevated risk of harm. For the harms of pharmacologic treatment, we included three RCTs (Table 30)287, 288, 324 and eight ESRs addressing medications other than antidepressants, which were addressed above under depression (Table 32).232,309311, 316, 318, 320, 322 Evidence indicated an increase in non-serious harms as measured by a higher percent of participants experiencing any adverse events or withdrawals due to adverse events if they were taking medication (vs. placebo). Serious adverse events were rare, and data were insufficient to determine whether the risk of serious harms was increased. Case-control studies found an association between benzodiazepine use and suicide death325 and spontaneous abortion.326 However, the inability to fully match cases and controls on severity of mental health symptoms and other health behaviors such as substance use limited our confidence in the causal nature of these associations.

Psychological Treatment of Anxiety

None of the included RCTs or ESRs of psychological treatment of anxiety reported on harms.

Pharmacologic Treatment of Anxiety

Three primary RCTs of medication use among primary care patients reported on adverse events (n=669, Tables 29 and 37). These included both RCTs described under KQ4 of venlafaxine287 and escitalopram288 as well as an RCT of buspirone that was not included for KQ4 because it had only 4 weeks of followup.324 All three medications were associated with statistically non-significant increases in the experience of any adverse effects (Table 37, Appendix F Table 24). Serious adverse effects were rare. In the trial of venlafaxine, four participants (3.3%) taking venlafaxine experienced serious adverse events compared with five (4.1%) who were taking placebo (RR, 0.79 [95% CI, 0.21 to 3.03], n=244).287 No participants experienced serious adverse events in the RCTs of either buspirone after 4 weeks or escitalopram after 12 weeks.288 Escitalopram had the greatest between-group difference in experiencing any adverse events (RR, 1.82 [95% CI, 0.94 to 3.51), N=177, 76% taking escitalpram vs 64% taking placebo). Among non-serious harms that were increased with escitalopram use were fatigue or somnolence (p<.001, 41% vs 11%) and urinary symptoms (p=.002, 9% vs 0%), but aches were higher in the placebo group (p=.05, 15% vs 6%).

Table 37. Adverse Outcomes Reported in Primary Research Studies of Pharmacologic Treatment of Anxiety in Primary Care Patients (KQ5).

Table 37

Adverse Outcomes Reported in Primary Research Studies of Pharmacologic Treatment of Anxiety in Primary Care Patients (KQ5).

Eight ESRs reported on harms or dropout for any reason (Table 38).232, 309311, 316, 318, 320, 322 Detailed results for all outcomes are shown in Appendix F Table 25. Dropout due to adverse events was increased with the use of antidepressants (for panic disorder),310 SSRIs and SNRIs (for social anxiety disorder),322 and benzodiazepines (for panic disorder)311 (Figure 37). In addition, persons with panic disorder were slightly more likely to experience any adverse events when taking antidepressants, compared to placebo (RR, 1.11 [95% CI, 1.07 to 1.15); 16 RCTs, N=4,246; I2, 0%).310 The most common non-serious harms reported by older patients with anxiety included gastrointestinal complaints, feelings of fatigue or sedation, and sleep concerns.309 The findings for dropout for any reason ranged from favoring pharmacotherapy to favoring placebo (Figure 38). Seven reviews addressing antidepressant use for any indication (including anxiety) were also included (Table 38), however we refer the reader to the results above under Depression (KQ5) for an examination of risks associated with antidepressant use.244, 246, 248250, 253, 254

This figure is a forest plot of group differences in dropout due to adverse events in ESRs with pharmacological treatment of anxiety compared to placebo.

Figure 37

Forest Plot of Group Differences in Dropout Due to Adverse Events in ESRs With Pharmacological Treatment of Anxiety Compared to Placebo (KQ5). Abbreviations: AE = adverse event; CI = confidence interval; PD = panic disorder; RR = relative risk; SaND = (more...)

This figure is a forest plot of group differences in dropout for any reason in ESRs with pharmacological treatment of anxiety compared to placebo.

Figure 38

Forest Plot of Group Differences in Dropout for Any Reason in ESRs With Pharmacological Treatment of Anxiety Compared to Placebo (KQ5). Abbreviations: CI = confidence interval; GAD = generalized anxiety disorder; PD = panic disorder; RR = relative risk; (more...)

Table 38. Characteristics of ESRs Addressing Harms of Pharmacologic Treatment of Anxiety (KQ5).

Table 38

Characteristics of ESRs Addressing Harms of Pharmacologic Treatment of Anxiety (KQ5).

For benzodiazepine use, an extensive review of pharmacologic treatment of mental health conditions during the perinatal period concluded that the strength of evidence was low for an association with spontaneous abortion and NICU admissions (Appendix F Table 26).232 The review also concluded that evidence was insufficient for preeclampsia, perinatal death, birthweight, Apgar score, and infant respiratory distress. They found no evidence on the association of benzodiazepine use with 19 other serious outcomes included in their review. Among older adults, a review of five studies of benzodiazepine treatment for anxiety found that mild adverse effects such as drowsiness, faintness, and light-headedness were more common with benzodiazepines than placebo.316 One study in this review reported a serious adverse event (severe gastralgia) in one participant taking a placebo (at 15 days) (Appendix F Table 26).

Additional harms of antidepressants are reported above under the harms of depression treatment; many of those reviews included trials of antidepressant use for any indication (including anxiety disorders). Even findings in reviews specific to people with depression likely also apply to people with anxiety, given the high level of comorbidity between these two conditions.

We identified two additional case-control studies published in our search window (Table 39) examining the association between benzodiazepine use and spontaneous abortion (n=262,070)326 or suicide risk (n=308);325 outcomes that were not addressed in the ESRs. The good-quality study of spontaneous abortion was based on a cohort of 442,066 pregnancies in the Quebec Pregnancies Cohort, a cohort drawn from the Quebec Public Prescription Drug Insurance Plan.326 The final sample included 26,789 patients with spontaneous abortions between gestation weeks 6 and 20, and 134,305 matched controls with pregnancies in the same calendar year and gestational age. Confounding variables pulled from medication dispensing databases, other medical records, and demographic databases included: antidepressant use, antipsychotic use, maternal age, welfare recipient status, urban dweller status, past 12 months’ healthcare utilization (inpatient, general practice, psychiatric, other specialty), past 12 months’ mental health diagnoses (mood and anxiety disorders, insomnia), folic acid exposure, and medical comorbidities (hypertension, diabetes, asthma, thyroid disorders, tobacco, alcohol or other drug dependence). This study found that benzodiazepines were associated with an 85 percent higher risk of spontaneous abortion (OR, 1.85, 95% CI, 1.61 to 2.12; 1.4% of cases had benzodiazepines dispensed vs. 0.6% of controls). They also found higher risk levels for both long- and short-acting agents, and all specific agents, as well as a dose-response effect (all p<.05). This was a well-executed study, however they could not directly measure symptom severity or other health behaviors that may be associated with mental health symptoms such as substance use, which could be independently related to spontaneous abortion.326

Table 39. Observational Studies of Harms of Pharmacologic Treatment of Anxiety, Excluding Antidepressant Treatment (KQ5).

Table 39

Observational Studies of Harms of Pharmacologic Treatment of Anxiety, Excluding Antidepressant Treatment (KQ5).

The fair-quality case-control study of suicide risk used Sweden’s national cause of death records to identify people who had died by suicide, and matched them 1-to-1 with people with mental health service use in the same timeframe by age, sex, and primary mental health diagnosis.325 Medication exposure was determined by a prescription database. Other potential confounders controlled for included: prescriptions for antidepressants, anticonvulsants, lithium, psychostimulants, antipsychotics and sedatives; previous suicide attempt; previous psychiatric inpatient stay; previous non-psychiatric inpatient stay; age; sex; and diagnostic group (mental and behavioral disorder due to substance use, schizophrenia and related conditions, bipolar disorder, depressive disorder, anxiety disorder, disorders of adult personality and behavior, Asperger’s/ADHD, and substance use). This study found that benzodiazepines were associated with an 83 percent higher odds of suicide death (OR, 1.83, 95% CI, 1.06 to 3.14; 42% of cases had benzodiazepines prescribed vs. 28% of controls). As with the other case-control study, this was a well-executed study but could not directly measure symptom severity or other health behaviors that may be associated with mental health symptoms that may be important confounders. In addition, this study relied on prescriptions rather than dispensing as the measure of benzodiazepine exposure, which is even further removed from medication actually taken.325

Suicide Risk

KQ1. Do Suicide Risk Screening Programs in Primary Care or Comparable Settings Result in Improved Health Outcomes in Adults, Including Pregnant and Postpartum Persons?

KQ1a. Does Sending Suicide Risk Screening Test Results to Providers (With or Without Additional Care Management Supports) Result in Improved Health Outcomes?

Summary

We found one short-term RCT (n=443) that examined screening for suicide risk, which was limited to primary care patients who had screened positive for depression (Table 40).327 This trial reported no statistically significant group differences in suicidal ideation at 2 weeks’ followup, and only a single suicide attempt among study participants.

Table 40. Characteristics of Suicide Risk Screening Studies (KQ1).

Table 40

Characteristics of Suicide Risk Screening Studies (KQ1).

Study Characteristics

One short-term RCT (n=443) was included for addressing the benefits of suicide screening, which was also included in the previous review (Tables 4042, Appendix G Table 1).327 This trial included adult primary care patients who had screened positive for depression in general practices in the UK. Patients were randomized to suicide screening or to answer health and lifestyle questions, with the primary aim of determining whether suicide screening increased the likelihood of suicidal ideation. Participants who screened positive for suicide risk were given information about helplines and other sources of help and were encouraged to use those resources. The mean age was 48 years (range, 18 to 92 years) and 70 percent were women. Retention was 81 percent at the 2-week followup.

Table 41. Participant Characteristics of Suicide Risk Screening Studies (KQ1).

Table 41

Participant Characteristics of Suicide Risk Screening Studies (KQ1).

Table 42. Intervention Characteristics of Suicide Risk Screening Studies (KQ1).

Table 42

Intervention Characteristics of Suicide Risk Screening Studies (KQ1).

Results

At 2 weeks’ followup, one control group participant had attempted suicide and there were no suicide attempts in the screening group (Table 43).327 There were no statistically significant differences between groups in the proportion feeling that life was not worth living (28% in the screening group vs. 24% in the control group; OR, 1.23 [95% CI, 0.76 to 1.98]), wishing they were dead (23% in both groups; OR, 1.01 [95% CI, 0.61 to 1.66]), or reporting thoughts of taking their own life (15% in the screening group vs. 11% in the control group; OR, 1.36 [95% CI, 0.72 to 2.54).327 Thus, although some outcomes trended in the direction of harm, confidence intervals were wide, making it inadvisable to draw conclusions about the short-term impact of suicide screening.

Table 43. Results From Suicide Risk Screening Studies (KQ1).

Table 43

Results From Suicide Risk Screening Studies (KQ1).

KQ2. Do Instruments to Screen for High Suicide Risk Accurately Identify Adults, Including Pregnant and Postpartum Persons, With High Suicide Risk in Primary Care or Comparable Settings?

Summary of Results

We included three studies that screened for suicidal ideation (Table 44).328330 Most screening instruments reported sensitivity and specificity above 0.80 for at least one reported cutoff (Figure 39). However, there was no replication of any instrument and two of the three studies included only three328 and 12330 individuals with suicidal ideation or at very high risk according to the reference standards. The study with the most events was limited to older adults.329

This figure is showing a summary of test accuracy of screening tools to detect high risk of suicide.

Figure 39

Summary of Test Accuracy of Screening Tools to Detect High Risk of Suicide (KQ2). Abbreviations: CI = Confidence interval; GDS-SI = Geriatric Depression Scale – Suicide Ideation; NA = Not applicable; SDDS-PC = Symptom Driven Diagnostic System (more...)

Table 44. Characteristics of Studies Examining Test Accuracy of Suicide Risk Screening Instruments to Identify People at Increased Risk of Suicide (KQ2).

Table 44

Characteristics of Studies Examining Test Accuracy of Suicide Risk Screening Instruments to Identify People at Increased Risk of Suicide (KQ2).

Study Characteristics

Three studies screening for suicidal ideation were included;328330 two were included in the previous review (Table 44).329, 330 Each study examined a different screening test, including two versions of the Geriatric Depression Scale (GDS), three separate questions about suicide from the Symptom Driven Diagnostic System for Primary Care (SDDS-PC) (feeling suicidal, thoughts of death, wishing you were dead), and an unnamed suicide risk assessment tool. All three studies were conducted in the US. Two recruited participants from primary care and the third recruited participants from the ED for any chief complaint (i.e., not limited to patients with mental health concerns). Sample sizes ranged from 124 to 1,001. Two studies recruited adults 18 years and older while one study recruited older adults (≥65 years) (Table 44). Mean age ranged from 47 to 75 years (Table 45). Women were represented in higher proportions than men: 52 to 63 percent of participants were women. Race and ethnicity were reported in only one study;329 93 percent were White. SES was reported in one study with a mean of 14 years of education.329, 330

Table 45. Participant Characteristics for Studies of Test Accuracy of Suicide Risk Screening Instruments (KQ2).

Table 45

Participant Characteristics for Studies of Test Accuracy of Suicide Risk Screening Instruments (KQ2).

Two studies used the SCID (one along with the HAM-D) to determine suicidal ideation, administered within a maximum of 4 days.329, 330 The third used an unstructured interview from a psychiatrist administered on the day of the screening test.328 The proportion of participants who were identified through interviews as being at risk of suicide ranged from 1.2 percent to 11 percent.

Results

GDS-15

One study reported test accuracy for the GDS-15 to identify suicidal ideation in older adults.329 The authors determined a GDS-15 cutoff of ≥4 would maximize sensitivity and specificity, but the optimal cutoff for women alone was lower (≥3) and for men it was higher (≥5). At a GDS-15 cutoff of ≥4, sensitivity to detect suicidal ideation was 0.75 (95% CI, 0.64 to 0.84) and specificity was 0.82 (95% CI, 0.78 to 0.85) (Appendix G Table 2). At higher cutoffs (≥5, ≥6), sensitivity decreased and specificity increased; at lower cutoffs (≥2, ≥3) sensitivity increased and specificity decreased.329

GDS-SI

One study reported test accuracy for the GDS-SI. The GDS-SI is a 5-item subset of the GDS that addresses suicidal ideation (GDS items 3, 7, 11, 12, and 14).329 The authors identified a GDS-SI cutoff of ≥1 as optimal to screen for suicidal ideation, with a sensitivity of 0.80 (95% CI, 0.69 to 0.88) and a specificity of 0.80 (95% CI, 0.77 to 0.84) (Appendix G Table 2). Stratified results showed at a GDS-SI cutoff of ≥1; test performance was similar between men and women. At higher cutoffs (≥2, ≥3), sensitivity decreased and specificity increased.329

SDDS-PC

One study (n=1,001) reported the test accuracy of three questions from the SDDS-PC to screen for suicidal ideation in primary care.330 The sensitivity of the “feeling suicidal” symptom to identify suicidal ideation was 0.83 (95% CI, 0.62 to 1.0) and the specificity was 0.98 (95% CI, 0.97 to 0.99). The “thoughts of death” symptom resulted in a sensitivity of 1.0 (95% CI, 0.76 to 1.0) and a specificity of 0.81 (95% CI, 0.78 to 0.84). The last symptom—“wishing you were dead”—yielded a sensitivity of 0.92 (95% CI, 0.76 to 1.0) and a specificity of 0.93 (95% CI, 0.92 to 0.95) (Appendix G Table 2).330

Suicide Risk Assessment Tool

One newly identified study examined the accuracy of a new risk assessment tool.328 The aim of the tool was to predict the risk of committing suicide within 72 hours and to replicate a psychiatrist-recommended intervention. The risk assessment tool was replicated with a sequentially recruited ED population (n=124). Compared with an interview from a psychiatrist, the sensitivity of the tool to identify moderate or high suicide risk was 0.42 (95% CI, 0.19 to 0.68) and the specificity was 0.98 (95% CI, 0.94 to 1.0). Only 12 participants were identified as at moderate or high risk of suicide (Appendix G Table 2).328

KQ3. What Are the Harms Associated With Screening for Suicide Risk in Primary Care or Comparable Settings in Adults, Including Pregnant and Postpartum Persons?

The same short-term study (n=443) that was included for KQ1 was the only evidence included for assessing the harms of suicide screening (Table 40).327 This study was designed to determine whether screening for suicide among people with symptoms of depression increased the risk of suicidal ideation. As described above under KQ1, two of three suicidal ideation items indicated a possible higher risk with screening, however the findings were inconclusive due to the lack of statistical significance and very wide confidence intervals (Table 43).

KQ4. Does Treatment of High Suicide Risk (Psychotherapy or Pharmacotherapy) Result in Improved Health Outcomes in Adults, Including Pregnant and Postpartum Persons?

Summary of Results

We included 23 RCTs (reported in 36 articles, N=22,632) of suicide prevention among people at increased risk of suicide (Table 46).331366 The impact of psychological interventions for suicide prevention on suicide deaths could not be determined due to the small number of events, however enough events were available to address suicide attempts. One large (n=18,882) good-quality multi-site trial conducted in US integrated care settings tested two suicide prevention interventions among adults with an elevated risk for suicide based on item 9 of the PHQ-9.365 This study found that, compared to usual care, a care management intervention had no impact on the rate of suicide attempts (HR, 1.07 [97.5% CI, 0.84 to 1.37]; p=.52) and a low-intensity online skills training intervention was associated with an increased risk of suicide attempts (HR, 1.29 [97.5% CI, 1.02 to 1.64]; p=.015) Most other studies reported five or fewer suicide attempts per study group and the pooled effect was not statistically significant (OR, 0.94 [95% CI, 0.73 to 1.22]; 12 RCTs [n=14,573]; I2=11.2%, including only the care management arm of the large trial; Figure 40, Table 47). Although there was a small statistically significant benefit for depression symptom severity, there was no clear improvement over usual care for suicidal ideation, self-harm, other mental outcomes, or emergency or inpatient healthcare utilization. (Table 47). Usual mental health care was the most common control group, and was in some cases enhanced or optimized, so most of the included studies could be considered comparative effectiveness studies. The study with the most favorable findings (n=598)used individually tailored depression care management for older adults who had screened positive for depression.337 This study reported improvements in depression outcomes for up to one year and suicidal ideation for up to eight months, but only five suicide attempts and one suicide death over two years. One study examined the impact of a pharmacologic intervention and found no differences between those taking placebo or 600 mg/day of lithium for up to one year in any suicide-related outcome, although medication adherence was low in this study.

This figure is a forest plot of proportion with a suicide attempt from the suicide prevention trials.

Figure 40

Forest Plot of Proportion With a Suicide Attempt From the Suicide Prevention Trials (KQ4). Abbreviations: BPD = bipolar disorder; CAMS = Collaborative Assessment and Management of Suicidality; CBT = cognitive behavioral therapy; CG = control group; CI (more...)

Table 46. Characteristics of Suicide Prevention Studies (KQ4).

Table 46

Characteristics of Suicide Prevention Studies (KQ4).

Table 47. Summary of Meta-Analysis Results for Suicide Prevention Studies (KQ4).

Table 47

Summary of Meta-Analysis Results for Suicide Prevention Studies (KQ4).

Study Characteristics

23 RCTs (N=2,694) examined the benefits of interventions to prevent suicide among those at increased risk (Table 46),336338, 340, 342, 343, 345, 347351, 353355, 358, 359, 361366 including one that aimed to both reduce depression and prevent suicide among older adults with a depressive disorder.337 Two studies were restricted to older adults,337, 359 one was limited to young adults (ages 18–25 years),366 and none were limited to perinatal women. Many of the studies were restricted to specific populations, however, including persons meeting the criteria for borderline personality disorder,336, 340, 342, 349, 350, 355 veterans,338, 345, 353, 362, 364 active duty members of the US Army,347 and college students.348, 354, 363

Fifteen of the trials were conducted in the US,337, 338, 343, 345, 347349, 353, 354, 359, 361365 and the remaining were in Australia,340, 366 Canada,350 The Netherlands,358 Denmark,351 and the UK.336, 342, 355 Studies used a wide range of recruitment strategies. The most common approaches were referral from medical or mental health practitioners, however three recruited through screening in primary care clinics337, 353, 359 and one identified patients through examination of electronic medical records for PHQ-9 results, which was routinely administered at mental health visits and primary care visits for depression treatment in the participating health systems.365 Three studies of a mobile app recruited patients from online forums, including some that focused on mental health or suicide prevention topics.343, 351, 358 We excluded studies that recruited patients from emergency or inpatient settings who were in the midst of an acute suicidal crisis, due to limited applicability of the findings to patients who would be identified through screening in primary care settings.

Sociodemographic information about the included samples are presented in Appendix G Table 3 and summarized in Table 48. Across all studies, the mean age was 33.8 years, and 66.3 percent of participants were women. Among the twelve trials conducted in the US and reporting on race or ethnicity, the percent of participants who were Black ranged from 18 to 31.9, the percent Hispanic/Latino ranged from 3.6 to 45.1, and the percent White ranged from 14.3 to 92. The highest proportions of Asian American or Pacific Islander, and Native American participants in any study were 16.1 percent and 4.8 percent, respectively. Only two trials included a sample in which less than half of participants were White, a study of veterans age 18–55345 and one of college students363. Two studies appeared to be primarily comprised of people with significant socioeconomic challenges.336, 361 One of these had a high proportion (54%) of participants who had experienced homelessness and 43 percent with an annual income below $10,000.361 In the other study, 47.7 percent were permanently disabled and only 11.4 percent were employed.336

Table 48. Summary of Participant Demographic Characteristics Among Studies of Suicide Prevention (KQ4); Weighted Mean (Number of Studies Reporting), Unless Otherwise Indicated.

Table 48

Summary of Participant Demographic Characteristics Among Studies of Suicide Prevention (KQ4); Weighted Mean (Number of Studies Reporting), Unless Otherwise Indicated.

One study examined the impact of a pharmacologic intervention (lithium)362 and the remaining examined behavioral interventions, along with usual mental health care. The most common intervention approach was dialectical behavior therapy (DBT) or programs based on DBT principles, used in seven studies (Table 49, Appendix G Table 4).340, 345, 349, 350, 354, 355, 361, 365, 366 The DBT studies were wide ranging in intensity and fidelity to original DBT approaches. They included some lower-intensity approaches such as a self-guided smartphone app,366 a brief on-line skills development program with brief messages from an interventionist,365 and a single 45- to 60-minute session.361 Higher contact interventions included weekly individual and group sessions for 6 months345 to 1 year.349, 354, 355 DBT includes cognitive behavioral elements and directly addresses suicidal thinking and behavior. Common elements included mindfulness, emotional regulation, distress tolerance, interpersonal effectiveness, and dialectics (i.e., understanding and tolerating two simultaneous yet opposing truths, such as acceptance of a current state or skill level and a desire to improve). Three other interventions used CBT approaches: one offered up to 30 CBT counseling sessions,342 one tested an app-based intervention338 and the third used a CBT program to improve sleep and was limited to people with suicidal ideation and insomnia.353 Other traditional clinical approaches included a 60-minute crisis planning meeting,336 depression care management,338, 365 and the Collaborative Assessment and Management of Suicidality (CAMS) approach.347 Two other novel approaches that have not been widely used were an app-based intervention designed to increase aversion to self-injurious thoughts and behaviors through pairing of words and images,343 and a series of expressive writing exercises.348

Table 49. Intervention Characteristics of Suicide Prevention Studies (KQ4).

Table 49

Intervention Characteristics of Suicide Prevention Studies (KQ4).

Control groups involved usual care. For most studies, this meant usual specialty mental health care (i.e., active treatment), due to the potential serious consequences of suicidal ideation. For some studies, usual care was enhanced in some way, such as by providing training to control providers, matching the amount of contact between the control and intervention groups, or limiting the control providers to those deemed to be expert clinicians in the community.

Five studies were rated as good quality336, 338, 342, 353, 365 and the remaining were rated as fair quality. The most common reasons for downgrading studies from good to fair included attrition greater than 10 percent, lack of information about allocation concealment and randomization procedures, and questions about the baseline comparability of the groups (often secondary to small sample sizes).

Results

Detailed results for all outcomes are reported in Appendix G.

Suicide-Related Outcomes

Two trials reported on suicide deaths by treatment group, both at 2 years’ followup.337, 349 One study was limited to older adults and reported one death by suicide.337 The other study was among patients with borderline personality disorder and reported no suicide deaths (Appendix G Table 5).349

Twelve trials reported suicide attempts and indicated no reduction in suicide attempts for the studied interventions.337, 342, 345, 347, 349, 351, 354, 358, 361, 362, 364, 365 The interventions studied included DBT, CBT, CAMS, lithium, and care management. The best evidence on suicide attempts comes from a large (n=18,882) good-quality multi-site trial conducted in US integrated care settings.365 This study tested two suicide prevention interventions among adults with an elevated risk for suicide based on item 9 of the PHQ-9. This study found that, compared to usual care, a care management intervention had no impact on the rate of suicide attempts (HR, 1.07 [97.5% CI, 0.84 to 1.37]; p=.52) and an online DBT-based skills training intervention was associated with an increased risk of suicide attempts (HR, 1.29 [97.5% CI, 1.02 to 1.64]; p=.015). Estimated event rates of the primary outcome of fatal or nonfatal self harm were 3.3% for those offered care management, 3.9% for those offered skills training, and 3.1% for those receiving usual care. The skills training intervention involved minimal contact with skills coaches, who did not provide psychotherapy but sent messages through the electronic health record portal to reinforce each visit to the online program, encourage practice of specific skills, and reach out to participants without recent visits. Frequency of outreach depended on each participant’s level of involvement but was at least monthly during the initial 6 months. The results for this trial held even among extensive sensitivity analyses.

Most of the remaining studies had only one to five suicide attempts in each group; only two other trials had more than ten suicide attempts in either group.342, 349 Both of these trials were limited to people diagnosed with borderline personality disorder, and they used CBT342 and DBT interventions.349 Of these two, a very high-intensity DBT intervention trial was the only study to find a statistically significant reduction in suicide attempts (OR, 0.34 [95% CI, 0.14 to 0.80], n=101).349 The intervention for this trial involved a median of 42 individual and 39 group DBT sessions. The overall pooled effect combining all twelve trials reporting this outcome (and including only the care management arm of the very large trial) was not statistically significant, with follow-up ranging from 3-months to 2-years (OR, 0.94 [95% CI, 0.73 to 1.22]; 12 RCTs [n=14,573]; I2=11.2%, Figure 40, Table 47).

Twelve trials reported on change in a continuous measure of suicidal ideation severity or number of days with suicidal ideation.338, 343, 345, 348, 349, 351, 353, 354, 361, 363, 366 The pooled analysis indicated no impact of the interventions on suicidal ideation beyond usual care (SMD, 0.14 [95% CI, −0.31 to 0.02]; I2=54.8%, 12 RCTs, N=1,734, Figure 41, Table 47, Appendix G Table 6). Point estimates ranged in both directions, and only two of the individual trials reported a statistically significant improvement at any timepoint on a continuous measure of suicidal ideation.354, 366 The trial of older adults who screened positive for depression reported a greater reduction in the percent of participants with suicidal ideation in the care management group (29.4% at baseline to 16.5% at followup) compared to usual care (20.1% to 17.1%, p=.01 for the difference between groups).337

This is a forest plot of standardized mean difference in change from baseline of continuous suicidal ideation measures from the suicide prevention trials.

Figure 41

Forest Plot of Standardized Mean Difference in Change From Baseline of Continuous Suicidal Ideation Measures From the Suicide Prevention Trials (KQ4). Note: “IG/CG Mn(SD)” show change from baseline in the native units of the measures reported. (more...)

Other Mental Health Outcomes, Quality of Life, and Functioning

The other mental health outcomes reported most widely included depression-related outcomes (remission, response,337 and symptom severity,336, 337, 342, 345, 348350, 353, 354, 361, 366 self-harm (non-suicidal intent, or a mix of suicidal and non-suicidal intent),336, 340, 350, 354, 355, 361 global mental health symptom severity,336, 342, 347, 350, 355 and anxiety symptom severity (Appendix G Tables 7 and 8).336, 342, 345, 361, 366

We conducted meta-analysis for depression symptom severity scores and found that suicide prevention treatment in high-risk individuals was associated with a small, statistically significant reduction in depression symptoms (SMD, −0.22 [95% CI, −0.33 to −0.10]; 11 RCTs [n=2,177]; I2=0%, Figure 42, Table 47). Three trials reported statistically significant reductions in depression symptoms, including the trial of a care management intervention among older adults who screened positive for depression; the intervention in this study also targeted depression.337, 353, 354 This care management study found a 3.5-point greater reduction in the HAM-D for participants in the intervention group at four months post-baseline (mean difference in change from baseline [MD], −3.5 [95% CI, −4.7 to −2.3]; n=598). The effect size diminished over time, and group differences were not statistically significant at the final followup after 18 months (p=.06). This study also reported an increased likelihood of depression remission at up to 8 months’ followup (OR, 2.1 [95% CI, 1.1 to 4.2]; n=487; 41.1% in the intervention group, 31.8% in the control group) and an increased likelihood of a clinically significant response at up to 1 year (OR, 2.0 [95% CI, 1.1 to 3.8]; n=405; 52.1% in the intervention group, 42.0 % in the control group reduced their HAM-D score by 50% or more).337

This figure is a forest plot of depression symptom severity scores from the suicide prevention trials.

Figure 42

Forest Plot of Depression Symptom Severity Scores From the Suicide Prevention Trials (KQ4). Note: “IG/CG Mn(SD)” show change from baseline in the native units of the measures reported. Abbreviations: BDI = Beck Depression Inventory; BPD (more...)

Five trials reported self-harm, all were in trials among patients with a diagnosis of borderline personality disorder or symptoms of borderline personality disorder.336, 340, 350, 354, 355 The findings were mixed and inconclusive (Appendix G Table 6). Four of these trials reported the proportion of participants with episodes of self-harm but the results were inconclusive; the pooled effect had wide confidence intervals (OR, 1.21 [95% CI, 0.71 to 2.07]; 7 RCTs [n=1,009]; I2=27.1%). On the other hand, two trials reported reductions in the number of self-harms episodes, among those with any self-harm episodes at baseline.350, 355 One of these reported a reduced number of suicidal and self-injurious episodes at the final, 32-week followup (1.4 in the intervention group, 2.6 in the control group over the previous 12 weeks, p<.04).350 The other trial reported a reduced number of days with self-harm in the previous 2 months (IRR, 0.91 [95% CI NR], p<.001).355 These two studies and a third that showed a reduced proportion with self-harm were high-contact trials of DBT among patients diagnosed with borderline personality disorder.349

Global mental health symptom severity measures generally showed very small, statistically non-significant differences in improvement favoring the intervention groups, with most group differences being one point or less on a wide variety of scales (Appendix G Table 8). Similarly, anxiety symptom severity, mental health-related quality of life, global quality of life, and social function were each reported in one to four studies, with null or mixed results.

Other Health Outcomes

The very large trial found no group differences in inpatient admissions with a mental health diagnosis.365 Two studies limited to patients with borderline personality disorder found no group differences in the proportion of patients with Accident and Emergency Department attendances or inpatient admissions (Appendix G Table 9).336, 340

Effect Modification and Findings in Specific Populations

None of the trials reported on effect modification by age, gender, race, or ethnicity, nor was there sufficient evidence to explore effect size variability by study or intervention characteristics through stratified analyses or meta-regression. The very large trial found that several demographic characteristics (sex, age distribution, race and ethnicity) and clinical characteristics (location of index visit, rates of prior mental health diagnoses) varied across levels of intervention uptake more than expected by chance. However, these comparisons did not show a consistent relationship between baseline indicators of risk and specific levels or types of intervention participation.365

KQ5. What Are the Harms of Treatment of High Suicide Risk (Psychotherapy or Pharmacotherapy) in Adults, Including Pregnant and Postpartum Persons?

Two of the included RCTs of suicide prevention that we examined reported on harms.336, 362 There were no differences between groups at followup on an instrument designed to assess the perceived level of coercion experienced by service users during hospital admission.336 There was no pattern of effect in the studies included for KQ4 to indicate paradoxical harms of treatment. The study of lithium found a higher rate of non-serious adverse events (75.7% with lithium, 69% with placebo, p-value not reported), a slightly higher rates of serious adverse events (38.8 % with lithium, 34.1% with placebo, p-value not reported) but but no difference in withdrawals due to adverse events (1.2% with lithium, 1.5% with placebo, p-value not reported).362

Image appbf1

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (12M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...