U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

O’Connor E, Henninger M, Perdue LA, et al. Screening for Depression, Anxiety, and Suicide Risk in Adults: A Systematic Evidence Review for the U.S. Preventive Services Task Force [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2023 Jun. (Evidence Synthesis, No. 223.)

Cover of Screening for Depression, Anxiety, and Suicide Risk in Adults: A Systematic Evidence Review for the U.S. Preventive Services Task Force

Screening for Depression, Anxiety, and Suicide Risk in Adults: A Systematic Evidence Review for the U.S. Preventive Services Task Force [Internet].

Show details

Chapter 2Methods

Scope and Purpose

This new topic incorporates and updates the evidence related to screening for and treatment of depression1 and suicide risk2 while adding evidence related to screening for and treatment of anxiety disorders and combination approaches that address more than one of these conditions. In general, this review focuses on screening adults (age ≥19 years) in primary care, including pregnant and postpartum persons, for depressive disorders, anxiety disorders, or for being at high risk of suicide. The evidence related to screening in child and adolescent populations are addressed by a separate topic and will not be reviewed here.138 This review provides updated and new evidence regarding the accuracy of instruments used to screen for depression, anxiety, or suicide risk in addition to the benefits and harms of screening and treatment for depression, anxiety, and the prevention of suicide. The USPSTF will use this review to update its 2016 recommendation on depression screening and 2014 recommendation on screening for suicide risk in primary care in the US,133, 137 as well as consider a separate recommendation on screening for anxiety.

We generally kept a consistent framework across all conditions but used existing systematic reviews (ESRs) for large, mature bodies of evidence and primary studies for smaller bodies of evidence.

Key Questions and Analytic Framework

With input from the USPSTF, we developed an Analytic Framework (Figure 1) and five KQs, using the USPSTF’s methods to guide the literature search, data abstraction, and data synthesis.

Figure 1 is the analytic framework that depicts the five Key Questions to be addressed in the systematic review. The figure illustrates how primary care screening and treatment for depression, anxiety, or suicide risk in adults age 19 years or older, including pregnant and postpartum individuals, may result in improved health outcomes (decreased depressive and/or anxiety symptomology, decreased suicide deaths, self-harm and ideation, improved functioning, improved quality of life, improved health status, and improved maternal/infant outcomes) (Key Questions 1 and 4). There is also a question related to the accuracy of screening instruments used to detect depression, anxiety, or suicide risk (Key Question 2). Lastly, the figure illustrates what harms may be associated with screening for or treatment of depression, anxiety, or suicide risk in adults age 19 years or older, including pregnant and postpartum individuals (Key Questions 3 and 5).

Figure 1

Analytic Framework.

  1. Do depression, anxiety, or suicide risk screening programs in primary care or comparable settings result in improved health outcomes in adults, including pregnant and postpartum persons?
    1. Does returning depression, anxiety, or suicide risk screening test results to providers (with or without additional care management supports) result in improved health outcomes?
  2. Do instruments to screen for depression, anxiety, or high suicide risk accurately identify adults, including pregnant and postpartum persons, with depression, anxiety, and high suicide risk in primary care or comparable settings?
  3. What are the harms associated with screening for depression, anxiety, or suicide risk in primary care or comparable settings in adults, including pregnant and postpartum persons?
  4. Does treatment (i.e., psychotherapy, pharmacotherapy, or both) of depression, anxiety, or high suicide risk result in improved health outcomes in adults, including pregnant and postpartum persons?
  5. What are the harms of treatment (i.e., psychotherapy, pharmacotherapy, or both) of depression, anxiety, or high suicide risk in adults, including pregnant and postpartum persons?

In addition, we delineated five contextual questions, which were addressed using abbreviated, not fully systematic methods and are therefore not shown on our analytic framework:

  1. What is the differential effect of screening for depression, anxiety, or suicide risk separately compared with screening for one or more of these conditions at the same time?
  2. Does screening improve process outcomes such as identification and appropriate diagnosis of persons with depression, anxiety, or risk of suicide; appropriate follow-up and referrals; mental health treatment engagement and retention?
  3. What health care system supports (e.g., collaborative care) can help ensure appropriate diagnosis and followup, treatment engagement and retention, and improved outcomes?
  4. How well do suicide risk screening instruments predict future suicide attempts?
  5. What is known about the validity of the most commonly used or recommended instruments to screen for depression, anxiety, and suicide risk in U.S. racial or ethnic minority patients?

Data Sources and Searches

We worked with a research librarian to develop a search strategy designed to identify studies of screening or treatment of depression, anxiety, or suicide risk, as well as studies investigating the accuracy of instruments used to screen for these conditions (Appendix A). The search was peer-reviewed by a second research librarian and was executed on September 24, 2021, searching for English publications in the following databases: Ovid MEDLINE, the Cochrane Central Register of Controlled Clinical Trials, and PsycINFO. We conducted ongoing surveillance through January 21, 2022.

Due to the expanded scope and the incorporation of evidence from previous USPSTF reviews, the start dates varied by condition and KQ (Appendix A Table 1). For KQs 1, 2, and 3 for depression and suicide risk, we bridged the search from the previous reviews, from 2014 and 2012 respectively. For KQs 1 and 3 for anxiety, we determined the search start year as 1990 since most SSRIs were approved in the early 1990s. For test accuracy studies (KQ2) for anxiety, we started our search in 2014, bridging from previously identified ESRs. For KQs 4 and 5, we searched for ESRs of depression treatment starting in 2015, but also searched for earlier Cochrane reviews if an evidence gap was identified in the literature published in or after 2015. For anxiety treatment benefit and harms (KQs 4 and 5), we bridged from previously identified ESRs for primary studies, with a search start date of 2015 and reviewed primary studies and other ESRs for inclusion. For suicide risk (KQs 4 and 5), we bridged from the previous USPSTF review, using a search start date of 2012.

In addition to the KQ search, we examined the reference lists of other previously published reviews, meta-analyses, and primary studies to identify additional potential publications for inclusion. We supplemented our searches with suggestions from experts and articles identified through news and table-of-contents alerts. We also searched ClinicalTrials.gov (https://ClinicalTrials.gov/) for ongoing trials that were listed as “recruiting,” “active,” “not recruiting,” “not yet recruiting,” “completed,” or “terminated” to identify relevant studies underway.

We imported the literature from these sources directly into EndNote® X7 (Thomson Reuters, New York, NY).

Study Selection

Two reviewers independently screened titles and abstracts of all references identified in the searches, using the inclusion and exclusion criteria as a guide to identify eligible studies. We developed criteria for inclusion and exclusion of primary studies and systematic reviews for each KQ (Appendix A Table 2). Potentially relevant studies included based on title and abstract were then independently assessed by two reviewers at full text using a standard form that outlined eligibility criteria. Any disagreements were reconciled through discussion or consultation with a third reviewer. Study assessment was conducted in DistillerSR (Evidence Partners, Ottawa, Canada), where detailed records were kept of all included and excluded studies.

For KQs 1 and 3 (benefits and harms of screening), we included RCTs of primary care (or comparable broad healthcare-based) adult populations (age ≥19), including pregnant people, investigating the benefits or harms of brief screening interventions for depression, anxiety, or suicide risk. For KQ1, we included studies in which the control group was also screened, but the screening results were not given to the participants’ primary care clinician (these were considered KQ1a studies). In addition, we included studies with additional components beyond screening, such as referral support, training in diagnosis or management, and patient materials.

For KQ 2 (test accuracy), we limited inclusion to only the most widely used or recommended screening tools for anxiety and depression but had no restriction on specific tools for suicide risk screening. For depression screening instruments, we included ESRs of the following tools: Patient Health Questionnaire (PHQ), any version; Center for Epidemiologic Studies Depression Scale (CES-D); Edinburgh Postpartum Depression Scale (EPDS) for perinatal persons. We additionally included any primary studies of the Geriatric Depression Scale (GDS) for older adults. For anxiety, we included primary studies for the following screening instruments: Generalized Anxiety Disorder scale (GAD), in any form; PHQ Anxiety scale; EPDS-Anxiety subscale, for perinatal persons; Geriatric Anxiety Inventory (GAI) and Geriatric Anxiety Scale (GAS) for older adults. For suicide screening, we included primary studies of any brief tools. Appendix A Tables 312 provide an overview of the included screening instruments for KQ2.

For KQ4 and 5 (benefits and harms of treatment) of anxiety and suicide risk, we included RCTs of psychological, pharmacological, or combination interventions compared to control conditions (e.g., placebo, usual care, wait list or attention control conditions). For anxiety and suicide risk we planned to initially limit inclusion to RCTs in which participants were recruited from a primary care or comparable health care settings. If evidence was insufficient when limited to primary care-based recruitment, we planned to expand the scope to include recruitment from other non-acute settings for suicide prevention treatment (e.g., recruitment from mental health settings), and to expanded to include ESRs of treatment for anxiety. In both cases, the primary care-based evidence was limited so we expanded our scope as planned. For the benefits and harms of anxiety treatment, we included only ESR results from broad analyses (e.g., not broken down or limited by intervention type or format, specific measures, or type of control group) and limited the examination of effect modification to publication bias and study quality/risk of bias. For the benefits and harms of suicide prevention treatment, we excluded studies that recruited patients from emergency or inpatient settings who were in the midst of an acute suicidal crisis, due to limited applicability of the findings to patients who would be identified through screening in primary care settings. For all conditions we excluded studies limited to people with comorbid medical and mental health conditions such as cancer, cardiovascular disease, substance use disorders, and serious mental illnesses.

We used ESRs to address the benefits (KQ4) and harms (KQ5) of psychological, pharmacological, and combined treatment of depression, due to the extremely large volume of literature and the maturity of the evidence base. Given the large number of reviews that met our eligibility criteria for these KQs, we adapted the decision tool developed by Pollack and colleagues139 to identify the most current and comprehensive evidence. As per Pollack and colleagues methods, we first focused on Cochrane reviews, followed by reviewing non-overlapping, non-Cochrane reviews.139 Our adaptation was that for ESRs of psychological treatment, rather than focusing on Cochrane reviews, we focused first on ESRs utilizing a comprehensive database of studies of the psychological treatment of depression developed and maintained by Cuijpers and colleagues.140 The Cuijpers database used a comprehensive search strategy and transparent, standardized methods for data extraction and coding, risk of bias assessment, and effect size calculation,140 and incorporated more contemporary trials than Cochrane reviews for this body of literature. This database is updated annually. Among the reviews based on the Cuijpers database, we used only the most recently reported effect size for any outcome or analysis. Outside of Cochrane and Cuijpers ESRs, we included only the most comprehensive or recent ESR when multiple relevant reviews covered the same outcome for the same body of literature. For analyses examining effects in specific populations, we focused on analyses of groups based on age, sex or gender, race or ethnicity, sexual orientation, and socioeconomic status.

Finally, for harms of pharmacologic treatment (KQ5) of anxiety and depression, we also included large observational studies published after the search window of ESRs that included observational studies. We only included observational studies addressing serious harms, including death, suicide attempts, and events likely to require medical treatment.

Quality Assessment and Data Abstraction

We used several tools to assess and rate the credibility of both primary studies and ESRs under consideration for inclusion (Appendix A Table 13).

We used study quality rating standards from the USPSTF manual.141 For primary research, two reviewers independently rated the studies’ methodological quality using USPSTF design-specific criteria (Appendix A Table 13).141 Studies were rated as “good,” “fair,” or “poor,” and discrepancies between raters were resolved by discussion or consultation with the larger review team. Good-quality studies were those that met nearly all of the specified quality criteria (e.g., comparable groups were assembled initially and maintained throughout the study and followup was approximately 90% or higher). Because mental health outcomes are assessed through patient self-report, good quality studies used either blinded, structured interviews or questionnaires completed without an interviewer’s assistance. Fair-quality studies did not meet these criteria but did not have serious threats to their internal validity related to their design, execution, or reporting. Poor-quality studies typically had several important limitations, including at least one of the following risks of bias: very high attrition (generally >40%), differential attrition between intervention arms (generally >20%); substantial lack of baseline comparability between groups without adjustment; or issues in trial conduct, analysis, or reporting of results (e.g., possible selective reporting, inappropriate exclusion of participants from analyses, questionable validity of randomization and allocation concealment procedures, or data for relevant outcomes not collected systematically). Studies rated as poor quality were excluded from the review.

ESRs of benefits and harms of treatment were rated as “good” if they were recent, relevant reviews with comprehensive sources and search strategies; had explicit and relevant selection criteria; reported a standard appraisal of included studies; and had valid conclusions. We rated them as “fair” if they were not clearly biased but lacked comprehensive sources or search strategies or did not report a standard appraisal of included studies, but these limitations seemed unlikely to introduce bias for the aim of the specific review. For example, some individual patient data meta-analyses relied on sources such as studies in a registry or submitted to the FDA, with the goal of examining effect modification (rather than searching multiple databases as would typically be expected). Also, individual patient data meta-analyses generally did not report a standard appraisal of the included studies, but we considered them likely unbiased for their purpose of examining effect modification. Similarly, ESRs using a cohort of studies based on an FDA database to examine publication bias were included even if they did not report standard appraisal of the included studies. We assigned a “poor” rating and excluded ESRs that were outdated, irrelevant, or biased, without comprehensive and systematic search for studies, explicit selection criteria, or, with the exceptions noted above, standard appraisal of studies. For ESRs, a single reviewer conducted the quality assessment and only ESRs that were rated as poor quality by the first rater were rated by a second reviewer. Discrepancies were resolved by discussion or consultation with the larger review team.

For instrument accuracy studies, we used ROBIS142 to evaluate the risk of bias for ESRs, and QUADAS-2143 to evaluate the risk of bias of primary diagnostic accuracy studies. We ultimately rated studies and ESRs as “good”, “fair”, or “poor” quality. Studies and ESRs were evaluated independently by two reviewers, and if deemed by both reviewers to have a high risk of bias, they were rated “poor” and excluded.

We abstracted data from each included review and primary study into detailed abstraction forms using DistillerSR (Evidence Partners, Ottawa, Canada). For all included evidence, one reviewer completed primary data abstraction, and a second reviewer checked all data for accuracy and completeness.

For ESRs we abstracted aim, inclusion criteria, and detailed results for the main findings of outcomes included in our Research Plan. We stratified results for specific populations listed in the Research Plan for the outcome of depression symptoms (i.e., pregnant and postpartum persons, older adults, and individuals identified through population-based screening in primary care or comparable community settings, and subgroups based on age, sex or gender, race or ethnicity, sexual orientation, and socioeconomic status). For other outcomes, stratified analyses were narratively summarized. Similarly, detailed results for effect modification analyses were only abstracted for the outcome depression symptoms and were narratively summarized for other outcomes.

Data Synthesis and Analysis

We synthesized findings using text, tables, and figures; where possible we conducted quantitative syntheses with meta-analysis. We used Stata 16.1 (StataCorp LLC, College Station, TX). All significance testing was 2-sided, and results were considered statistically significant if the p-value was 0.05 or less.

For meta-analysis of primary research trials (KQ1, KQ4), we used the restricted maximum likelihood model with the Knapp-Hartung correction for small numbers of studies.144, 145 When studies included multiple intervention groups, we used the single most intensive or comprehensive intervention group per study in the meta-analysis. For dichotomous outcomes, we used study-reported adjusted risk rations (RRs) if available and calculated unadjusted RRs if adjusted results were not reported. For continuous measures, we used change from baseline in each group as the measure for analysis. We pooled between-group standardized mean differences (Hedges’ g) because studies used a variety of specific measures. Where there was evidence of effect modification, our primary analyses were stratified by study population.

For meta-analysis of KQ2, data from 2-by-2 contingency tables were analyzed using a bivariate model, which modeled sensitivity and specificity simultaneously. If there were not enough studies to use the bivariate model, sensitivity and specificity were pooled separately, using random effects models with the DerSimonian & Laird method.146 We did not quantitatively pool results when data were limited to fewer than three studies. When quantitative analyses were not possible, we used summary tables and forest plots to provide a graphical summary of results. For KQ2 studies that only conducted reference standard interviews with a subset of participants who screened negative, we extrapolated based on the proportions in the subgroup that met the diagnostic criteria to estimate sensitivity and specificity of the full sample.

For all meta-analysis, we assessed the presence of statistical heterogeneity among the studies using the I2 statistic. When analyses found large statistical heterogeneity, we suggest using the 95% CI or range of estimates across the individual studies as opposed to point estimates. However, the high statistical heterogeneity for specificity is in partly due to the high degree of precision around estimates from individual studies.

For evidence from ESRs, we display pooled results in forest plots as reported in the ESRs. We used placebo-controlled comparisons if available. We accepted only RCT evidence for benefits of treatment (KQ4), but both RCT and observational evidence were eligible for harms of pharmacotherapy (KQ5). For results derived from observational studies, a parenthetical note is included in the forest plot.

Grading the Strength of the Body of Evidence

We graded the strength of the overall body of evidence for each KQ within each condition. We adapted the Evidence-based Practice Center (EPC) approach,147 which is based on a system developed by the Grading of Recommendations Assessment, Development and Evaluation Working Group.148 Our method explicitly addresses four of the five EPC-required domains: consistency (similarity of effect direction and size), precision (degree of certainty around an estimate), reporting bias (potential for bias related to publication, selective outcome reporting, or selective analysis reporting), and study quality (i.e., study limitations). We did not address the fifth required domain—directness—as it is implied in the structure of the KQs (i.e., pertains to whether the evidence links the interventions directly to a health outcome).

Consistency was rated as reasonably consistent, inconsistent, or not applicable (e.g., single study). Precision was rated as reasonably precise, imprecise, or not applicable (e.g., no evidence). The body-of-evidence limitations reflect potential reporting bias, quality of the individual studies, and other important restrictions in answering the overall KQ (e.g., lack of replication of interventions, nonreporting of outcomes important to patients).

We graded the overall strength of evidence as high, moderate, or low. “High” indicates high confidence that the evidence reflects the true effect and that further research is very unlikely to change our confidence in the estimate of effects. “Moderate” indicates moderate confidence that the evidence reflects the true effect and that further research may change our confidence in the estimate of effect and may change the estimate. “Low” indicates low confidence that the evidence reflects the true effect and that further research is likely to change our confidence in the estimate of effect and is likely to change the estimate. A grade of “insufficient” indicates that evidence is either unavailable or does not permit estimation of an effect. We developed our overall strength-of-evidence grade based on consensus discussion involving at least two reviewers.

Expert Review and Public Comment

A draft Research Plan was posted on the USPSTF Web site for public comment from May 7 to June 3, 2020. The USPSTF received comments regarding eligible populations, examination of subpopulations, outcomes, eligible settings, and requests for clarifications of language or approach. Commenters requested the inclusion of studies limited to persons with disabilities, medical conditions, and mental health conditions other than depression, anxiety, and increased suicide risk. In response to public comment, the USPSTF included studies that enroll participants with the conditions listed above; however, studies limited to participants with these conditions will not be included due to lack of broad applicability to primary care populations. Additionally, the USPSTF added a priori subpopulations of interest for detailed examination if data were available. Pregnancy outcomes were added, such as preterm birth, and a contextual question was added to address intermediate process outcomes such as appropriate diagnosis, treatment initiation, and treatment engagement. Another change in response to comments was the inclusion of studies in emergency department settings if the screening is broadly applied (e.g., not limited to persons in the midst of a mental health crisis). Finally, selected text was edited for clarity. In addition, the draft evidence report was posted on the USPSTF Website for public comment from September 20 through October 18, 2022. In response to comments received, we corrected minor errors, adopted several suggested wording changes, provided some additional requested information or detail, and evaluated studies suggested for possible inclusion (but found that none met our inclusion criteria).

USPSTF Involvement

The authors worked with USPSTF at key points throughout the review process to develop and refine the analytic framework and key questions and to resolve issues pertaining to scope for the final evidence synthesis. This research was funded by the Agency for Healthcare Research and Quality (AHRQ) under a contract to support the work of the USPSTF. AHRQ staff provided oversight for the project, reviewed the draft report, and facilitated external review of the draft evidence synthesis. However, the authors are solely responsible for the content.


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (12M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...