Most of the instruments included in this review have moderate to good operating characteristics for the detection of GAD and PD in unselected primary care settings. Only one study specifically addressed test performance in a sample presenting with somatic symptoms, while the others assessed test performance on all patients presenting to primary care settings. Most of the instruments included in this review appear feasible for use in primary care based on comprehension level. However, it is uncertain if these instruments are responsive to change in symptom status and the specific values associated with a clinically important response are unknown. There is preliminary evidence from one study that test performance can vary based on race. There is a current lack of evidence about differences in test performance based on gender and setting.


Our search identified a large number of anxiety measures, but few measures had been studied in primary care populations. These measures had moderate to good operating characteristics, but unlike instruments used in the detection of other common mental illnesses such as depression or dementia, the operating characteristics have not been replicated in multiple samples. Even for the SDDS-PC—the only instrument evaluated in multiple studies—the versions studied were different, which might change the test performance. Based on operating characteristics, we did not find sufficient evidence to recommend a single validated self-report measure in the screening of GAD or PD in primary care. Our results are congruent with the recent guidelines from the National Institute for Health and Clinical Excellence on GAD and PD.51 However, based on study quality, operating characteristics, precision of the estimates, potential for assessing response to change, and other feasibility issues, the most promising instruments are: the panic module of the PHQ, the GAD-7 and the SDDS-PC.

It is important to note that all of the instruments included in this review are for screening or case-finding purposes and do not by themselves make a diagnosis of GAD and PD. A diagnosis must be established through further evaluation by a primary care physician or by a mental health professional to whom a patient is referred. All but one of the instruments were validated by screening unselected primary care patients. Therefore, it is possible that they might perform differently if the instrument is used for case-finding in a high-risk population. For example, patients who present with unexplained GI symptoms might have a higher incidence or severity of anxiety disorders and, therefore, change the performance of the instrument. We did not identify any studies that validated instruments in a case-finding situation except for the unnamed 10-item instrument by Barsky and colleagues.36

While the studies included in the review provide information about the performance of various screening instruments, the impact of screening for GAD and PD on direct, patient-related outcomes is not known. In the absence of direct evidence on the impact of screening for anxiety, we considered a number of theoretical implications. These included patients who receive a true negative test result might be reassured that they do not have a mental illness, and if they present with a physical symptom, medical causes for this might be investigated further. For those who receive a false positive test result, there might be distress from an incorrect diagnosis, a greater likelihood of receiving ineffective treatment, or a delay in diagnosis of the true cause of their symptoms. Patients who receive a false negative test result might continue to remain anxious due to a delay in effective treatment and might undergo unnecessary diagnostic testing in a futile search for other medical causes for the anxiety symptoms.

Though many of the above considerations are inferred rather than proven in studies, there is good evidence that anxiety disorders are underrecognized and that there are effective treatments for anxiety disorders, and thus screening has the potential to improve patient outcomes. One review of antidepressant medications found that the number needed to treat for GAD was 5.15 and that antidepressants were significantly better that placebo.21 Similarly, various types of psychotherapy, including cognitive behavioral therapy and supportive therapy, have been shown to be better than treatment as usual for GAD.52 Both tricyclic antidepressants and selective serotonin reuptake inhibitors (SSRIs) have been shown to be superior to placebo in the treatment of PD.53 The mean effect size for acute treatment outcome for SSRIs relative to placebo was 0.55 (a moderate effect) in one analysis of twelve randomized controlled trials.54 Psychotherapy as well as Internet-based therapy have been shown to be effective in treating PD.55,56 Although not a test of anxiety screening per se, at least two care management trials have shown that screening, coupled with effective primary care treatment, improves clinical outcomes for patients with a variety of anxiety disorders.16,20

When evaluating screening instruments for anxiety, it is important to consider that the criteria for GAD will change with the new DSM-V. While still not final, one proposed change is to reduce the duration of symptoms of GAD from 6 to 3 months.57 In the primary validation study for the GAD-7, 67 percent of patients with scores greater than 10 (the proposed cut point) had symptoms for more than 6 months, and 96 percent had symptoms more than 1 month.35 With changes such as those mentioned above, the number of false positive results could decrease—but the number of false negatives would likely increase as the spectrum of disease is shifted to less chronic and potentially milder disease. Another proposed change is the addition of criterion D, which requires the presence of at least one of four avoidance behaviors that are not included in screening instruments based on DSM-III and DSM-IV editions. Thus, the implications on the performance characteristics of the current instruments are yet unknown.

In summary, the U.S. Preventive Services Task Force has not issued a recommendation on screening for anxiety disorders. Our review shows several promising self-report instruments that could be used in VA for case-finding or evaluated in studies to determine the impact of systematic screening. Currently, the best clinical use of these measures in primary care would be for case-finding in patients with somatic symptoms or other factors that heighten the suspicion of an anxiety disorder. The reorganization of VA primary care services into Patient Aligned Care Teams (PACTs) that better integrate mental health services may present an opportunity to test the utility of anxiety screening measures that are coupled with high-quality diagnostic evaluation and treatment.


An important limitation of the current review is the lack of studies reporting on patient outcomes and societal impact. This has been recognized as a challenge in systematic reviews of diagnostic tests.58 We were unable to assess for publication bias because there is currently no reliable way to make this assessment for studies of diagnostic tests. Unlike studies of interventions, studies of diagnostic accuracy do not have a database such as, where one can identify studies that were started but not published or which are ongoing—making the assessment of publication bias challenging.

The studies included in this report were heterogeneous and thus prevented the statistical pooling of data. Further, the number of studies identified was small and therefore limited our ability to do subgroup analyses or meta-regressions to further explain observed heterogeneity.

Our eligibility criteria were designed to exclude poor-quality studies (e.g., studies where the same person conducted the screening and criterion standard were excluded). As a result, all studies were of at least moderate quality. This means that some poor-quality studies that could provide low-level evidence on the topic might have been excluded from the review. However, we think that the solution to this would be to encourage high-quality validation studies.

We also excluded studies of instruments in languages other than English and Spanish as well as articles that were published in languages other than English. While this might have excluded studies that were otherwise rigorous in methodology, we thought that these studies would not be directly applicable to the U.S. Veteran population and therefore not relevant for this report.59

Despite these limitations, this report was a highly structured and systematic review of the extant evidence. Our evidence synthesis was guided by a carefully designed standardized protocol, including a systematic search of research databases and relevant bibliographies, double data abstraction, and use of validated criteria to assess the quality of identified studies. Our multidisciplinary team included expertise in internal medicine, primary care, psychiatry, and psychology.


Replication of the results for instruments with promising characteristics is needed to verify initial findings. Specifically, rigorous validation of these instruments in the VHA is needed to ascertain applicability to the Veteran population. Given the limited number of measures evaluated in primary care populations, some investigators might reasonably develop and test novel anxiety instruments. An important consideration for such an undertaking is whether to develop general screens for significant anxiety symptoms, or more specific measures that are disease specific. Disease-specific measures that screen for a range of disorders would likely require more items and take longer to complete than general measures but could facilitate diagnostic evaluation and better treatment matching. General and disease-specific measures may also differ in responsiveness to change. Whether general or disease specific, these instruments would be tested ideally in unselected primary care populations, in patients presenting with symptoms commonly associated with anxiety disorders (e.g., chest pain or insomnia), and in patients with common mental illnesses (e.g., depressive disorder) as a screen for a co-occurring anxiety disorder. Further, given the preliminary data suggesting a possible difference by race, these studies should be powered for subgroup differences. The presence of a higher proportion of older adults in the Veteran population is another reason for validation of these instruments in the VHA since detection of anxiety disorders among older adults is known to be especially challenging.60

To evaluate the effects of screening for anxiety disorders, RCTs would be needed that include important patient outcomes such as effects on symptom status and patient functioning. Given the lack of benefit from screening for depression as a single intervention, and the high likelihood of similar findings for anxiety disorders, these trials would need to include a structured treatment component. There are no current recommendations for routine screening of primary care patients with anxiety disorders though prevalence is high and there are demonstrable impacts on functioning. Establishing patient outcomes, such as the percentage of patients who go on to receive treatment, their quality of life, and treatment side effects, would be important in determining if screening for anxiety in primary care has an overall positive impact to the individual patient and to the health system and society at large. Alternatively, if RCTs are considered not practical, a formal process—such as one suggested by the U.S. Preventive Services Task Force that evaluates and links evidence from screening and treatment studies— could be used to evaluate the potential benefits of systematic screening.

Another consideration for future studies is the inclusion of feasibility questions in the study design that assess the feasibility of the instrument in an average clinic. Specifically, patient receptiveness to completing screening, time taken, incompletion rates, and validity of instruments when administrated by telephone, handheld device, or Web compared with in-person screening are all critical to the comprehensive assessment of the effectiveness of an instrument. Responsiveness to change is another desirable characteristic and has been a key feature in the adoption of the PHQ-9 for depression. A similar evaluation of this property for anxiety instruments could promote uptake of these instruments into routine practice.

Summary of Recommendations

  • Though none of the included scales have sufficient evidence to be recommended as the single best option, the PHQ, GAD-7, and SDDS-PC are the most promising based on performance and applicability. Future research should focus on replicating early findings.
  • These scales should be considered for incorporation into the VA Mental Health Assistant to facilitate use by providers.
  • The Primary Care–Mental Health Integration Program and PACT should consider anxiety measures that have the most evidence based on this review.
  • VA Research and Development should consider supporting studies to evaluate the performance of these instruments in the Veteran population since performance may differ in older adults with high rates of medical and psychiatric comorbidities.
  • The instruments should be evaluated for sensitivity to change to enable monitoring of illness and response to treatment.
  • Studies should include assessments of the feasibility and validity of different modes of administration and should be powered to detect differences in the performance of instruments based on age, race, setting or ethnicity.


In summary, there are several promising case-finding instruments with good performance for GAD and PD in primary care populations. However, there has been little replication of initial validation studies. There is also a lack of evidence about the feasibility of these instruments for telephone administration and their sensitivity to change. Though there is preliminary evidence that test performance can vary by race, this has not been addressed by any of the major validation studies, and there have been no followup studies on this question. Studies are needed that replicate initial findings and systematically study feasibility and variations in performance based on race, gender, and setting.

