We identified fifteen articles that met our inclusion criteria. These studies evaluated twelve tools intended to be used as screening instruments to detect PTSD.


We screened 1998 titles (1302 from MEDLINE and 696 unique abstracts identified in the PILOTS database) and rejected 1844 because they did not meet eligibility criteria. We performed a more detailed review on 154 articles. From these, we excluded an additional 140 articles and added 1 article from hand searching for a total of 15 studies that addressed one of the key questions. We grouped the studies by screening instrument. Figure 1 details the process.

Figure 1. Literature Flow.

Figure 1

Literature Flow.

KEY QUESTION #1. What tools are used to screen for PTSD in primary care settings, and what are their characteristics (i.e., length, format/administration, response scale)?

We identified twelve screening tools that were evaluated using a gold standard structured diagnostic interview in a primary care setting. All were self-administered paper and pencil screening tests, and ranged from one to twenty-seven items. We provide a brief description of each of the screens evaluated by the included studies. The focus, length, response scale and test-retest reliability of each screen are provided in Table 3. Although the focus of this systematic review is an evaluation of the literature of screening instruments for PTSD used in primary care, we included screens for multiple psychiatric disorders or multiple anxiety disorders if there was a study that investigated the ability of the screen to identify PTSD in a primary care setting.

Table 3. Screens Used to Identify PTSD in Primary Care Clinics.

Table 3

Screens Used to Identify PTSD in Primary Care Clinics.

Breslau's Short Screening Scale

The Breslau scale was developed as a short self-report version of the National Institute of Mental Health DIS and the World Health Organization CIDI, version 2.1.43 The initial development was done using data from the Detroit Area Survey of Trauma, a random-digit-dialing survey of 2,181 individuals, from 18 to 45 years old, living in the Detroit primary metropolitan statistical area. Respondents were asked to identify all traumatic events they had experienced and PTSD was assessed in connection with one randomly selected event from those identified. Diagnostic Statistical Manual-IV (DSM-IV) criterion symptoms of PTSD were evaluated in a structured interview and further analyses were limited to 1,830 individuals with complete data on all 17 symptoms, 142 of whom had PTSD. Items evaluating duration and impairment were not included. A model with seven symptoms (five from the “avoidance and numbing” group and 2 from the “arousal” group) was found to have higher positive predictive value and specificity than models with six or eight symptoms. In addition, a score of four or more was selected as the best overall cut point taking into consideration sensitivity (80%) and specificity (97%). In this initial derivation study, results were similar for men and women.

Primary Care PTSD Screen (PC-PTSD)

The PC-PTSD is a four item screen designed for use in primary care clinics.9 Items are scored dichotomously as either 0 or 1 (no = 0; yes = 1). Each item was developed to map onto one of four empirically supported symptom factors proposed to underlie the construct of PTSD including 1) re-experiencing a traumatic event, 2) emotional numbing, 3) avoidance, and 4) hyperarousal.9 Respondents are asked about symptoms experienced in the past month that were related to a traumatic event that occurred anytime in their lifetime. The instrument was designed to be appropriate for those with at least an eighth grade reading level (Flesch-Kincaid grade level = 7.7). It has a test-retest reliability of 0.83. The PC-PTSD is the instrument used within VA medical centers and community-based outpatient care clinics across the United States.

Single-item PTSD Screener (SIPS)

The Single-item PTSD Screener (SIPS) is a single item asking respondents to indicate to what degree they were recently bothered by a past traumatic experience.10 The SIPS was developed to improve the implementation of PTSD screening in primary care clinics by providing a single, easy to memorize item. Candidate questions for the single item were written, discussed with study investigators, and then reviewed with patients to ensure the item was intuitive, clear, and acceptable. Candidate items were then refined and reviewed again by investigators before one was selected for the screen. The final selected item for the screen was, “Were you recently bothered by a past experience that caused you to believe you would be injured or killed?”. Response options include not bothered at all, bothered a little, or bothered a lot.


The SPAN is a screening tool initially developed in a psychiatry clinic for the purpose of detecting PTSD in psychiatric populations with PTSD prevalence rates around 50%.44 The SPAN was derived from the Davidson Trauma Scale (DTS), a 17-item scale that assesses the DSM-IV PTSD criteria.45 In order to create the SPAN, the authors split a sample of 243 participants into derivation and replication samples with the goal of identifying seven or fewer items that could be used to screen for PTSD. The result was the four-item screening tool that assesses Startle, Physiological arousal to reminders, Anger, and Numbness (SPAN). Estimates of PPV and NPV in a population with a 10% PTSD base rate suggested that the SPAN may also be useful in settings with a lower prevalence of PTSD, such as primary care clinics.44

PTSD Checklist (PCL), PCL-6, PCL-2

The PTSD Checklist (PCL) is a 17-item self-report measure designed to assess severity of PTSD symptoms.46 Participants rate the severity of each of the DSM-IV symptoms for PTSD using a 5-point Likert scale. The DSM-IV PTSD symptoms are grouped into clusters as follows: five re-experiencing symptoms, seven numbing/avoidance symptoms, and five hyper-arousal symptoms. The PCL instructs participants to rate items in relation to stressful experiences. The measure generally exhibits good internal consistency46 and has been demonstrated to be related to other self-report measures of PTSD as well as gold standard structured interviews for PTSD such as the CAPS. There are three versions of the PCL (original military version, PCL-M; civilian version, PCL-C; and specific version PCL-S) differentiated by the specificity of the index traumatic event used while completing the questionnaire. The PCL-M asks respondents to rate each item based on a single (unspecified) stressful military experience.46 The PCL-C does not require the specification of a worst single traumatic event; consequently, individual symptom ratings may be based on different stressful life events. The PCL-S requires item ratings based upon a single specified traumatic experience. Several coding strategies have been applied to the PCL that can be used to indicate probable PTSD.47,48 These include using several different cutoff scores, using DSM-IV criteria (e.g., reporting the requisite number of symptoms within each cluster at the moderate or greater level 3 or 4 on Likert scale), or using a combination of these two approaches (e.g., the requisite DSM-IV criteria are endorsed and the total score is above a specified cut point). Two short versions of the PCL were also evaluated – PCL-6 and PCL-2.49 The two item version was created using the two items with the highest item-total correlations (i.e., items evaluating intrusive memories and distress with reminders of the trauma). The six item version was developed by using two items that correlated most highly with the total score of items within each of the three symptom clusters.

My Mood Monitor Checklist (M-3)

The My Mood Monitor checklist (M-3) is a one page, 27 item symptom screening tool for psychiatric disorders commonly found in and treated by primary care providers: PTSD, depression, bipolar disorder, and anxiety disorders (generalized anxiety disorder, panic disorder, obsessive compulsive disorder, and social anxiety disorder).12 In addition to disorder specific symptom items, the M-3 also includes four additional questions about functional impairment. Each of the 27 questions is rated on a frequency scale from not at all (0) to most of the time (4) for a period covering the previous two weeks. The scale was constructed at a sixth grade reading level. M-3 scoring is conducted in two steps. First, the respondent must screen positive on either: 1) the suicidality item or, 2) the impairment items (any of the four items rated as often or more than one as sometimes). If either “gateway” criterion is positive, then symptom modules are scored. The PTSD screening module consists of four questions assessing re-experiencing, avoidance, numbness and startle.

Provisional Diagnostic Interview – 4 Anxiety (PDI-4A)

The Provisional Diagnostic Interview – 4 Anxiety (PDI-4A) was developed to help primary care providers quickly screen for a range of mental health disorders commonly seen in primary care: attention-deficit/hyperactivity disorder, generalized anxiety disorder, major depression, mania, panic disorder, social phobia, PTSD, obsessive-compulsive disorder, and hypochondriasis.49 Twenty-three items from an original pool of 85 items are included in the screening tool. The instrument was developed in a sample of 343 participants recruited from 21 primary care centers. The participants were high utilizers of primary care services -- 65% of participants had 2-5 primary care visits in the past three months -- and over half reported functional disability in the same time period. The PTSD portion of the screen consists of one re-experiencing item from the DSM-IV definition of PTSD (“Having disturbing memories or dreams related to previous life-threatening events or assaults?”). If the item is rated as present at least sometimes over the previous week, it is scored as positive. In order for the screen to be counted as positive, however, the patient must also endorse at least one symptom listed in the screening instrument (for any disorder) reflecting in impairment in daily functioning with a frequency of at least once per week.

Anxiety and Depression Detector (ADD)

The Anxiety and Depression Detector (ADD) is a five item screening questionnaire developed to be used in primary care settings to assess possible panic disorder, PTSD, social phobia, generalized anxiety disorder, and depression.51 The questionnaire was developed as part of the Collaborative Care for Anxiety and Panic study, and validated in university-affiliated primary care clinics in two states. The performance of candidate items (two per disorder) were evaluated on a random selection of half of their sample and then the better performing items were evaluated on the remaining half. For PTSD, the two initial items were combined into a single question (i.e., either flashbacks or nightmares). In the final screen, each disorder is assessed by a single item.

Generalized Anxiety Disorder Scales (GAD-7, GAD-2)

The GAD-7 was developed as part of the Patient Health Questionnaire (PHQ) anxiety study.52 The study enrolled patients from 15 primary care sites in 12 states. The scale was developed in 2,149 patients and data from 591 were used to determined test-retest reliability. The seven items are scored on a scale of 0 (not at all) to 3 (nearly every day), with a total score of 0 to 21. A score of five represents mild anxiety symptoms; scores of 10 and 15 represent moderate and severe anxiety, respectively. The GAD-2 consists of the first 2 items of the GAD-7. These items reflect core anxiety symptoms. The score on the GAD-2 ranges from 0 to 6. In terms of identification of specific anxiety disorders, the GAD-7 and the GAD-2 were conceptualized as “first-step” screening tools.


There were twelve screening tools, ranging in length from one to twenty-seven items, which were used to identify possible PTSD within a primary care setting. Of those twelve, seven screen exclusively for PTSD and the remaining five screen for the psychiatric disorders commonly encountered and treated by primary care providers.

KEY QUESTION #2. What are the psychometric properties and utility of the screening tools (sensitivity, specificity, likelihood ratios, predictive values, area under curve, reliability)?

In this section, we review the studies that examined each of the screening tools and provide their psychometric properties and the Level of Evidence rating in tabular form by instrument. The studies included in the review, their focus, and their Level of Evidence ratings are included in Table 4. Rationale for the Level of Evidence ratings is presented in the final column (QUADAS ratings) of the study characteristics table in Appendix E. Detailed findings are presented in Table 5. After describing the studies, we will also specifically discuss the four studies in which a comparison was made between screening instruments.

Table 4. Characteristics of Included Studies with Level of Evidence Rating.

Table 4

Characteristics of Included Studies with Level of Evidence Rating.

Table 5. Screen Performance Characteristics.

Table 5

Screen Performance Characteristics.

Breslau's Short Screening Scale

The utility of the Breslau scale in VA general medicine and women's health clinics was evaluated by Kimerling.13 Excluded were those with obvious cognitive impairment, a preferred language other than English, or an invalid telephone number. The Breslau scale was completed by 92% (n=237) of patients approached in the clinics. Of those, 57% (n=134) completed the Breslau a second time and underwent a clinician interview (CAPS) approximately one month later. The interview was completed in-person by trained psychologists blinded to the Breslau scale results. Test-retest reliability of the Breslau scale was 0.84 (p<0.001). The prevalence of PTSD in the interviewed sample was 25%. Likelihood ratios for cut scores of 4 or higher, 5 or higher, and 6 or higher are presented in Table 5. Area under the ROC curve was not reported. The authors recommended a cut score of 4. At that score, sensitivity was 85% and specificity was 84%. PPV was 64%, NPV was 94%, and positive and negative likelihood ratios were 5.31 and 0.18, respectively. The authors noted that the sensitivity and specificity were acceptable for cut scores of 4, 5, and 6. Based on likelihood ratios, the authors suggested that patients with scores of 6 or 7 should be targeted first for additional assessment.

Individuals from a family practice clinic were the focus of a second study.11 In addition to the Breslau scale, participants in this study completed the PCL-C, the PC-PTSD, and the SPAN. The study included a convenience sample of patients 18 years and older and excluded those who did not speak English, had gross cognitive impairment, or were medically unstable. Participants were asked to rate the presence of PTSD symptoms in the previous month. Screening and diagnostic interviews were completed on 411 patients (53% of those who originally consented or 79% of those contacted). The telephone interviews were conducted by experienced survey research personnel rather than mental health professionals. The order of administration of the four screening measures was randomized. The diagnostic interview (referred to as CAPS-modified because the interviewers were not mental health professionals) followed the four screening measures. Blinding was not reported but the screening and diagnostic interviews were completed in one telephone session. The gender and race distributions of study participants were significantly different from those in the clinic population. Specifically, there were fewer men, more whites, and fewer African American or other race individuals in the study population. The prevalence of PTSD in the sample was 32%. The overall diagnostic efficiency (area under the ROC curve) for the Breslau scale was 0.88. It was noted that diagnostic efficiency was significantly better for men than women. The optimal cut score was reported to be 4. At that cut score, sensitivity was 85%, specificity was 76%, PPV was 31%, NPV was 98%, positive likelihood ratio was 3.58 and negative likelihood ratio was 0.20.

Primary Care PTSD Screen (PC-PTSD)

Three studies that met criteria for the present review evaluated the PC-PTSD.9-11 In two of the studies, the gold standard was the CAPS;9,11 in the third study, the gold standard was the PSS-I.10

In the initial psychometric paper, Veterans attending general medical or women's health clinics were recruited.9 Those with gross cognitive impairment or a primary language other than English were excluded. The proportion of all eligible patients represented by those who participated in the study or the number of patients who were approached, but declined to participate were not reported. From a convenience sample of 335, 167 (50%) completed the PC-PTSD. The base rate of PTSD was 26% based on results from the CAPS administered by trained psychologists within one month of the screening tests and blinded to the screening results. At a cut score of 3, sensitivity was 77%, specificity was 85%, PPV was 63%, NPV was 91%, positive likelihood ratio was 5.13, and negative likelihood ratio was 0.27. Area under the ROC curve was not reported.

Freedy et al. observed a PTSD base rate of 32%.11 Details of this study are reported above. Area under the ROC curve for the PC-PTSD was 0.92. At a cut score of 3, sensitivity was 85%, specificity was 82%, PPV and NPV were 38% and 98%, respectively, and positive and negative likelihood ratios were 4.72 and 0.18, respectively.

The third study recruited participants from three military health system primary care clinics.10 Of 3,234 participants who completed the Single-item PTSD Screener (SIPS, see below) and agreed to further evaluation, a sampling procedure was developed to include a mix of individuals who responded “bothered a lot,” “not bothered,” and “bothered a little.” Of 229 who were invited and consented to a longer assessment involving the PC-PTSD and a structured diagnostic interview (PSS-I), interviews were completed in 213 (93% of those who consented; 6.6% of those who were initially screened). The PSS-I was administered by trained mental health professionals blinded to the SIPS results. Due to the sampling method, base rate could not be determined but was estimated by the study authors to be 9%. Although the sampling was stratified and the sample size was moderate for this study, they used sampling weights and propensity scores to extrapolate to the larger population. A cut score of 2 or more was considered optimal yielding 91% sensitivity and 84% specificity. The PPV and NPV were 37% and 99%, respectively. The positive likelihood ratio was 2.89; negative likelihood ratio was not reported. Area under the ROC curve was 0.89.

Single-item PTSD Screener (SIPS)

Only one study met our inclusion/exclusion criteria that evaluated the SIPS.10 This study, as described above, used sampling weights and propensity scores to extrapolate findings to the larger population. The area under the ROC curve for the SIPS was 0.77. The optimal cut point for the SIPS was “bothered a little” yielding a sensitivity and specificity of 76% and 79%, respectively. The PPV and NPV were 26% and 97% while the positive likelihood ratio was 2.28. The authors reported moderate to high correlations between the SIPS and other self-report instruments (Spearman correlations with PC-PTSD = 0.59; with PCL = 0.63). The SIPS also demonstrated adequate test-retest reliability (r = 0.63, n = 104; p < 0.001; median days between assessments = 13). The SIPS appeared to be less discriminating with a significantly lower area under the curve (0.77, 95% CI = 0.77-0.84) than the PC-PTSD (0.89, 95% CI = 0.84-0.94). As noted by Gore and colleagues, the limited range of response options for the SIPS and the fact that it is only a single item most likely underlie its lower reliability and validity.10


Meltzer-Brody and colleagues examined the performance of the SPAN in an outpatient OB/GYN clinic among English-speaking women presenting for routine annual exams.55 On an initial survey, patients were asked to indicate if they had experienced a severe trauma. The survey was completed by 76% (292/384) of patients approached. Those who reported a trauma history (n=88) were instructed to complete the SPAN and invited to participate in a diagnostic interview (the MINI). The interview was conducted in-person by a psychiatrist blinded to the SPAN results. Only 36% (32/88) of those invited to participate (11% of the 292 who completed the initial survey) completed the interview. PTSD was diagnosed in 25 of the 32 (78%). The SPAN performed well; the area under the ROC curve was 0.75. With the suggested cut point of 5 or greater, sensitivity was 72%, specificity was 71%, PPV was 90%, NPV was 42%, the positive likelihood ratio was 2.52, and the negative likelihood ratio was 0.39. However, the strength of evidence for this study is limited by the inclusion of the trauma screening question and selection of only those with a trauma history since the trauma screening question is part of the SPAN screen.

As noted above, Freedy et al. administered multiple screening tools in a family practice clinic.11 Participants rated the intensity of the four SPAN elements over the past month. The gold standard was the CAPS-modified. The author-identified optimal cut-score was 3 yielding a sensitivity of 76%, specificity of 72%, positive predictive value of 25%, and negative predictive value of 96%. Positive and negative likelihood ratios were 2.67 and 0.34, respectively. Although 3 was identified as the optimal cut-score for both men and women, sensitivity (89% vs. 74%), specificity (78% vs. 72%), and the percent correctly classified (79% vs. 72%) were all higher for men.

In a more carefully designed study, Yeager et al. examined the use of the SPAN in VA primary care clinics.8 The authors identified eligible participants from 229,780 Veterans who had made a primary care health care visit in a single year to one of four VA medical centers. Potential participants were randomly selected and invited to participate through a mailing. Female Veterans were oversampled by identifying all women in a single clinic and inviting participation. The gold standard evaluation was the CAPS with a focus on current and/or lifetime PTSD. The CAPS was administered over the telephone by trained clinicians within 2 months of the screening evaluation; the clinicians were blind to the SPAN results. Among the primary care patients, the screening response rate was 74% and 82% of those who completed the screening evaluation were interviewed. Among the female Veterans, the screening response rate was 69% and 68% of those were interviewed. Completers were older and more likely to be white. Preliminary examination found no differences between the 62 women recruited and the randomly selected primary care sample, so they were combined for the primary analyses. The base rate of PSTD was 11.3%. In this high quality study, the SPAN performed well. The area under the ROC curve was 0.84. At the recommended cut score of 5, sensitivity and specificity were 74% and 82%, respectively. PPV and NPV were 34% and 96%; positive and negative likelihood ratios were 4.09 and 0.32.

PTSD Checklist (PCL)

The PCL was the most widely studied instrument, and there were eight studies that investigated the validity of the PCL in a primary care setting. Five studies used the civilian version of the PCL (PLC-C),11,47,49,53,54 one used the PCL-S,9 and two did not specify the version used.8,56

Andrykowski et al. examined an interview version of the PCL-C (not paper and pencil) in 82 women at least six months (6-72 months) after treatment for breast cancer (all women were in remission).47 PCL-C performance was assessed against the non-patient version of the SCID I PTSD module administered by doctoral-level students at the same time as the PCL-C. Interviewers were not blind to patient screening status. Of 107 women eligible for the study, 84 consented (79%) and 82 were interviewed (77%). Current PTSD was diagnosed in 6% with lifetime PTSD diagnosed in another 4%. A cut score of 50 on the PCL-C yielded a sensitivity of 60% and specificity of 99% associated with a positive predictive value of 75% and a negative predictive value of 97%. Positive and negative likelihood ratios were 60.0 and 0.40, respectively. Area under the ROC curve was not reported.

Dobie et al. examined the PCL-C in woman Veterans seen at a single VA medical center.53 It is unclear what proportion was being treated in outpatient medical clinics as opposed to mental health clinics. Of the 2,545 women Veterans invited to participate, only 11% were included in the study and there was no information available regarding clinical characteristics of non-participants. CAPS interviews were administered by a clinician blind to the PCL-C results. Prevalence of PTSD was 36% and the area under the ROC curve was 0.86. The authors identified a cut score of 38 as optimal and found sensitivity was 79% as was specificity. PPV and NPV were 68% and 87%, respectively, while positive and negative likelihood ratios were 3.78 and 0.26.

In a similar study, Walker et al., selected a random sample of 1,963 female HMO members and mailed a questionnaire packet that included the PCL (version not specified).56 Using a traumatic experiences screening instrument, the Childhood Trauma Questionnaire (CTQ), women who screened positive for childhood sexual trauma were identified and solicited to participate in the study. Further, a random sample of woman who fell below the screening threshold on the CTQ were also contacted and asked to participate in the study. Participants were interviewed using the CAPS within 2 months of completing the PCL. It was not reported whether the interviews were face to face or conducted by telephone or whether the interviewers were blind to the PCL results. Of the 1,225 who completed the questionnaires 21.3% were interviewed. No information was provided regarding those who were interviewed and those who were not interviewed. Further, it is impossible to determine the base rate of PTSD in the sample since women were selected for interview based on screening positive for childhood sexual trauma; in the interview sample, 10.7% were diagnosed with PTSD. The area under the ROC curve was 0.84. At the author-identified optimal cut score of 30 on the PCL, sensitivity was 82%, specificity was 76%, PPV was 28%, and NPV was 97%. Positive and negative likelihood ratios were 3.40 and 0.24, respectively.

In addition to the PC-PTSD (as noted above), men and women recruited from general medical and woman's health clinics at a VA Medical Center completed the PCL-S.9 Data were reported for a sub-sample of 167 participants. Both the PCL-S and the CAPS were completed in person with CAPS interviewers (trained psychologists) blind to the PCL-S score. The base rate of PTSD in the sample was 26%. Using a cut score of 48 on the PCL-S, sensitivity was determined to be 84% and specificity was 90%. PPV was determined to be 62% and NPV 94%. Positive and negative likelihood ratios were 8.40 and 0.18, respectively. Area under the ROC curve was not reported.

In the first Lang et al. study, the authors examined the full PCL-C in women Veterans seeking care at a VA outpatient primary care clinic.54 Fifty-six percent of those eligible agreed to participate; 87% of those agreed to a follow up telephone interview, and 25% were randomly chosen to be interviewed using the PTSD section of the CIDI 2.1. The interview took place less than a month after the PCL-C was administered and interviewers were blind to the PCL-C results. The randomly selected subgroup differed from the unselected group on age, race, and marital status. Moreover, the mean score of the PCL-C of those interviewed was significantly lower than the scores of those who were not interviewed. The prevalence of PTSD was 31%. The diagnostic efficiency of the PCL-C as indicated by the area under the ROC curve was 0.89. With an identified optimal cut score of 28, sensitivity and specificity were 94% and 68%, PPV and NPV were 58% and 96%, and positive and negative likelihood ratios were 2.94 and 0.09, respectively.

Abbreviated versions of the 17 item PCL-C (two items and six items) were examined by the same authors.49 The study population consisted of primary care patients from a VA clinic or a university-affiliated clinic. Those willing to participate completed the PCL-C and returned it by mail. Approximately 50% of those who agreed to participate were randomly selected for a diagnostic interview (CIDI 2.1). The interval between administration of the PCL-C and the CIDI 2.1 was not reported; interviewers were blind to the PCL-C score. It is unclear how representative the consented participants were of the population and how effective the randomization process was. The PCL-C was completed by 275 of 401 patients enrolled (65%); both the PCL-C and CIDI 2.1 were completed by 154 patients (37% of those enrolled). Using two items from the PCL the diagnostic efficiency (area under the ROC curve) was 0.88 and nearly identical to the AUC for the six item version (0.89). The sensitivity for the 2 item version using a cut score of 4 was slightly greater than for the 6 item version using a cut score of 14 (96% versus 92%), however the specificity of the 2 item version was considerably poorer (58% versus 72%). Other values for the 2 item version were as follows: PPV 29%, NPV 99%, positive likelihood ratio 2.29, and negative likelihood ratio 0.07. Corresponding values for the 6 item version were 36%, 98%, 3.29, and 0.11.

The most methodologically sound study of the PCL in primary care settings was conducted by Yaeger et al. in 2007.8 As described above, the authors enrolled Veterans who had made a primary care health care visit in a single year to one of four VA medical centers. As with the SPAN, the CAPS interviewers were blind to the PCL scores. The diagnostic efficiency of the PCL (version not specified) as determined by the area under the ROC curve was 0.88. At the author-identified optimal cut score of 31, the sensitivity was 81% with equal specificity. PPV and NPV were 35% and 97%, respectively, while positive and negative likelihood ratios were 4.31 and 0.23. As noted on Table 5, increasing the cut score to 50, another commonly recommended cut score, improved false positive errors, but slightly increased the false negative rate by 3% -more concerning for an instrument used for screening purposes.

The PCL-C was another of the screening instruments examined by Freedy et al.11 As noted above, the study population included adults attending a family practice training clinic (11% of all patients approached or 53% of those consented participated in the study). The prevalence of PTSD was 32%. Overall diagnostic efficiency (area under ROC curve) was 0.93 for the PCL-C. At the author-identified optimal cut score of 43, sensitivity was 80%, specificity was 82%, PPV was 37%, NPV was 97%, positive likelihood ratio was 4.54, and negative likelihood ratio was 0.24.

In sum, with the exception of Yeager et al.,8 studies investigating the utility of the PCL as a screen for PTSD in medical settings using structured interviews as a gold standard are generally of limited quality.

My Mood Monitor Checklist (M-3)

The initial validation study of the M-3 was published in 2010.12 In the only, but well designed, study of the M-3, consecutive patients were approached in a university associated family medicine clinic. All were English speaking and mentally competent to consent to participate. Of those approached, 54% (n=723) agreed to participate. All who filled out the screening form were asked to be interviewed using the MINI by an experienced master's level interviewer blind to screening status. Within one month of screening, 647 were interviewed (89%). Optimal screening thresholds were determined on 80% of the initial cohort and then validated on the remaining 20%. The PTSD base rate was 6.3%. When compared with the MINI, the PTSD module (at the author-chosen cut score of 2) demonstrated a sensitivity of 88% and a specificity of 76%. The positive predictive value was 20% and the negative predictive value was 99%. Positive and negative likelihood ratios were 3.69 and 0.16, respectively. Area under the ROC curve was not reported.

Provisional Diagnostic Interview – 4 Anxiety (PDI-4A)

There was one study investigating the PDI-4A that met inclusion criteria.50 Participants were non-psychotic individuals at a primary care clinic for a routine visit. PDI-4A results were evaluated against the SCID. Data were reported for 343 patients who completed a self-report screen and an interview by a primary care provider. The diagnostic interview (SCID) was administered by “trained raters” but whether the raters were blinded and the time interval between the PDI-4A and SCID were not reported. Only 17 (4.9%) participants within the sample met criteria for PTSD based on the SCID. Given this base rate, and with the PTSD and functioning items of the PTI-4A both rated at least “sometimes,” sensitivity was 71%, specificity was 72%, positive predictive value was 12%, and negative predictive value was 98%. The positive likelihood ratio was 2.54 and the negative likelihood ratio was 0.40. Area under the ROC curve was not reported. The authors also estimated the performance of the screening instrument in a clinic with an 8.6% prevalence; the estimated positive predictive value (PPV) was 18%, compared to the 12% PPV in the study sample. The strength of evidence for the screen is limited by the purposeful selection of participants based on likelihood of meeting diagnostic criteria for disorder of interest, as well as a non-disordered control group (i.e., a non-independent sample).

Anxiety and Depression Detector (ADD)

One study that examined the ADD met our inclusion criteria.51 Of 12,724 patients approached at university-affiliated primary care clinics, 7,738 (61%) completed the screening questionnaire, and 1,494 of the 7,738 people who participated screened positive for panic disorder, social phobia, PTSD, generalized anxiety disorder, or depression. From those who screened negative, 1,107 patients were randomly selected. Diagnostic interviews using the CIDI by telephone, were completed by 569 (38%) of those who screened positive and 232 (21%) of the randomly selected negative screen patients (31% overall). The interviews were conducted by trained CIDI interviewers who were not blind to the ADD results. Of the 801 interviewed, 18.5% were diagnosed with PTSD with a significantly higher rate among non-whites and 38% of the sample screened positive for more than one disorder (24% vs. 16% of whites, p<0.01). The sensitivity, specificity, PPV, NPV, positive likelihood ratio, and negative likelihood ratio for the one PTSD item on the questionnaire (yes/no scoring) indicating possible PTSD were 62%, 83%, 48%, 89%, 3.54, and 0.46. When a version of the questionnaire that included items related to panic disorder, social phobia, and PTSD was used to predict PTSD, the positive likelihood ratio decreased to 1.47 (Table 5). The authors noted that the PTSD screen had a higher sensitivity for whites than non-whites (86% vs. 76%, p<0.01) but found no differences based on gender or age. Diagnostic status (OR=5.41, 95% CI 3.4 to 8.6) and comorbid depression (OR=1.95, 95% CI 1.3 to 3.0) were significant predictors of screening status for PTSD.

Generalized Anxiety Disorder Scale (GAD-7, GAD-2)

There was one study designed to evaluate the ability of the GAD-7 and GAD-2 to detect anxiety disorders, including PTSD, in primary care.14 Participants were enrolled from a research network of 15 primary care facilities in 12 states. Of the 2,149 patients whose responses were used to develop and validate the GAD-7, 1,654 (77%) agreed to a telephone diagnostic interview and 965 of those (58%) were randomly selected. Interviews were conducted blind and within 1 week of completing the screen. The interviews included the generalized anxiety disorder, social anxiety disorder, and PTSD sections of the Structured SCID. The interview sample had a slightly higher percentage of women (69% vs. 63%, p=0.003) and had a significantly higher GAD-7 score (5.7 vs. 5.1, p=0.01) than those who were not interviewed. Age, race, and education were similar in the two groups. PTSD was diagnosed in 83 patients (8.6%).

The sensitivity, specificity, and positive likelihood ratio values for the GAD-7 and GAD-2 were best for generalized anxiety disorder. However, similar values were observed for other anxiety disorders (panic disorder, social anxiety disorder, and PTSD). At a GAD-7 cut-point of 8 or greater, the sensitivity, specificity, PPV, NPV, positive likelihood ratio, and negative likelihood ratio for identifying PTSD were 76%, 75%, 22%, 97%, 3.1, and 0.32, respectively. Corresponding values for the GAD-2 were 59%, 81%, 23%, 95%, 3.1, and 0.51. Because the intent of screens such as the GAD-7 and the GAD-2 is to detect the presence of any anxiety disorder, the authors ascertained the sensitivity and specificity by comparing patients with specific anxiety diagnoses with those who had no diagnoses.14 Despite the fact that the GAD-7 and GAD-2 yield acceptable accuracy for the identification of multiple anxiety disorders, the false positive rate would be much higher if used to detect PTSD in a clinic setting.

Comparative Studies

There were four studies that compared screening instruments for PTSD (Table 6).8-11 Two of the studies, were given Level III evidence ratings.9,10 Although both Freedy et al.11 and Yeager et al.8 compared the SPAN and PCL within their studies, one study only reported SPAN statistics up to a cut score of 4,11 which was not the optimal score in the second study.8 Nonetheless, the SPAN performed similarly in those two studies at that cut score despite differences in the study designs. In both studies, the PCL slightly outperformed the other instruments as evidenced by higher AUC statistics; however, the difference was likely not clinically meaningful.

Table 6. Studies Comparing More than One Screening Instrument.

Table 6

Studies Comparing More than One Screening Instrument.

The PCL-S and PC-PTSD comparison in the Prins et al. study is limited by the use of a small convenience sample.9 The PCL-S outperformed the PC-PTSD as might be expected by a longer screening tool; however, the PC-PTSD was found to have good clinical utility with a positive likelihood ratio of 5.13 and a negative ratio of 0.27.

The focus of the Gore et al. study was to identify a single item screening question that could be used by primary care providers as the first stage in a multi-step assessment process.10 Although both the PC-PTSD and PCL-C were given to the interview sample, the PCL-C was used for validation purposes only and no statistical comparisons of the single item screen to the PCL-C were made. As would be expected, the performance of the four item PC-PTSD was better than that of the single item SIP.


Although there is limited information regarding the implementation of PTSD screens in primary care settings, what information does exist suggests that such screening can be done efficiently and that it can have a positive impact on the clinical process and is acceptable to both patients and providers. Six of the studies employed samples of Veterans or military personnel, and all of these were evaluating PTSD-specific screening tools. Screening tools functioned in a clinically useful fashion per likelihood ratio statistics in both Veteran and community samples. However, no study included both sample types, so there is no information as to whether a particular screen is better able to detect PTSD in a Veteran or a community sample. Screen length, at least up to 27 items, can be readily administered in most clinical settings prior to patients' appointments.

As can be seen in Table 4, only three of the fifteen studies were methodologically rigorous enough to warrant Level I ratings (i.e., they had independent, blind comparison of sign or symptom results with a “gold standard” of anatomy, physiology, diagnosis, or prognosis among a large number of consecutive patients suspected of having the target condition). Two, from the same investigator, met criteria for a Level II rating. Four were rated Level IV due to lack of independence of diagnostic interviewers (i.e., not blind to screening status), selective sampling of interviewees or both. Studies with Level IV ratings often overestimate the performance of the screens they aim to evaluate, thereby limiting our confidence in the strength of their findings.

Performance of the very brief screening tools for PTSD (those of one or two items) had the least discriminative power and had steep trade-offs in sensitivity and specificity between their limited response options resulting in no clear optimal cut score. Comparisons of the intermediate length screens were limited by the lack of comparative studies of sufficient rigor. Only one study directly compared intermediate length screens and that study suffered from uncorrected sampling bias and interviews not blind to screen outcomes.

Of the intermediate screens, the PCL-6 item had the smallest evidence base, and was evaluated in only one study, which limits the generalizability of those findings. Across studies, there is weak evidence that the clinical utility of the Breslau scale and PC-PTSD are likely comparable, and some indication that both the Breslau scale and the PC-PTSD discriminate better than the SPAN. However, direct comparisons of these scales in a rigorously controlled study would be necessary to increase confidence in these findings.

The PCL is the most widely studied of the screens, although only one study had sufficient rigor to warrant a Level of Evidence rating of I8 and two studies by the same author were rated as Level II.49,54 Across studies, using 50 as a cut score was associated with negative likelihood ratios of 0.5 or greater, indicating that the post-screen odds of not having PTSD given a negative screen were no better than what might be assumed given population prevalence rates. That is, there would a significant risk of false negative rates, indicating that this often used cut-score is too high even for a Veteran population.

The optimal cut-score for the most commonly used screen, the PTSD Checklist (PCL), varied across clinical settings according to differences in PTSD prevalence rates and sample compositions. Because the 17 item screen has a more graded scoring distribution, optimal cut-scores could be more precisely determined for a given clinical setting. In contrast, across studies, the intermediate length screens all had sharp drop-offs just under the recommended cut-scores. This suggests that the optimal cut score of an intermediate length screen is less likely to vary across populations and settings, and can therefore be more easily adopted by different healthcare systems. However, it also means that there is a steeper trade-off between sensitivity and specificity in cut-scores that differ only by one point, which may have significant policy and resource implications.

We also examined the performance of five scales that screened for multiple conditions, including PTSD (ADD, PDI-4A, M-3, GAD-7, GAD-2). None of these more general screens were assessed in Veteran samples. Of the general screens, only the M-3, GAD-7, and GAD-2 were evaluated with sufficient rigor to evaluate their potential utility to screen for PTSD. The GAD-7 was superior to the GAD-2 in terms of its accuracy in detecting PTSD among primary care patients.14 The likelihood ratios for the detection of PTSD, both positive and negative, for the M-3 indicated that the M-3 performed better than the GAD-7 at identifying probable cases of PTSD.12,14

KEY QUESTION #3. What information is there about the implementability (e.g., ease of administration, patient satisfaction) of PTSD screening tools in primary care clinics?

Although not all studies reported the time it took for patients to complete the screen that was the focus of the study, those that did indicated that briefer screens took no more than five minutes,13 and the longest screen (27 item M-3) was reported to have taken patients only five to ten minutes to complete.14 This suggests that none of the screens posed a significant time burden when used in a primary care setting.

Only one study conducted a process evaluation of screen implementation in their clinics.12 Both patients and providers were administered questionnaires following the post-screening medical appointment regarding: 1) the logistical aspects of screen administration/review and 2) whether there was any change in the patient-provider interaction in the appointment immediately following screening. In terms of screen administration, only 1% of patients reported that they had insufficient time to complete the 27 item screen prior to their appointment. Of the clinicians who reviewed the screen results, 83% reported that they were able to review the results in under one minute. Most patients (70%) talked to their providers about their feelings and symptoms and 63% felt that the screening process facilitated that discussion. Of patients who were eventually diagnosed with a mental health condition, 75% felt that the screening process facilitated discussion of mental health issues with their providers. Most primary care providers (80%) reported that reviewing screen results facilitated discussion of feelings and emotional symptoms with their patients, and none found it too cumbersome.


Only three studies evaluated logistical or experiential aspects of using a screening tool in clinical practice. There was no evidence regarding readability, speed of administration, ease of interpretability, or patient satisfaction for the remainder of the instruments and no comparative studies of these implementation issues.

KEY QUESTION #4. Do the psychometric properties and utility of each of the screening tools differ according to age, gender, race/ethnicity, substance abuse, or other comorbidities?

There were six studies that evaluated whether a screening tool demonstrated demographic-dependent variation in screen validity or utility, and only four of the six studies did so systematically. All four studies that evaluated potential differences systematically examined screen performance characteristics for men vs. women. Only two examined the potential modifying effect of age or race (Table 7);8,51 no studies examined the effect of specific psychiatric comorbidities.

Table 7. Evidence for a Moderating Effect of Demographic Factors on Screen Characteristics.

Table 7

Evidence for a Moderating Effect of Demographic Factors on Screen Characteristics.

Gore (2008) used demographic based propensity scores to compare the odds of PTSD diagnoses within response strata and found little evidence that demographic factors considered collectively affected screen performance; however, the authors note that with a sample of fewer than 300 people that they were insufficiently powered to adequately assess differences in screen performance between subgroups.10 Consequently, this adds little to the evidence base regarding potential demographic differences in screen performance. Similarly, Kimerling (2006) reported that the operating characteristics (i.e., sensitivity and specificity) for men and women were “similar” for the Breslau scale, but since no comparative statistics were presented, this study was also not included in the review of evidence for Key Question #4.13

Of the studies that systematically examined gender differences in screen utility, findings were mixed. Freedy et al. reported that the PCL, Breslau, and PC-PTSD (but not the SPAN) were better able to detect PTSD in men than in women across all indices (Table 7).11 Although this did not impact the optimal cut scores recommended for the Breslau scale or the PC-PTSD, they found that the optimal PCL cut score for men was different than for women (46 vs. 43). Given how close these cut scores are, it is unlikely that the utility of the PCL would differ for men vs. women in a clinical setting. More importantly, the sample used in the study was significantly different from that of the patient population from which it was drawn in that patients who were female and those who were white were more likely to participate in the study. Since no adjustments were made for this selection bias, it is unclear how this might have affected the results. Prins (2003) similarly found that the PC-PTSD was better able to detect PTSD among men vs. women Veterans, but this study also suffered from significant methodological limitations and, additionally, was based on a fairly small convenience sample.9

In contrast, Means-Christensen (2006) reported that the performance of the ADD was comparable for men and women, but that it was less able to discriminate cases vs. non-cases among non-whites than among whites (76% vs. 86%).51 Unfortunately, no other statistical information was provided, and it is unclear whether this difference has meaningful clinical significance.

In the most methodologically rigorous study to examine demographic differences in screen performance, Yeager (2007) found that both the SPAN and PCL performed similarly for men as for women Veterans.8 Although there was no primary effect of race in performance of either screen in this study, the performance of the PCL (but not the SPAN) was significantly different for white vs. African American Veterans among the younger cohort. Specifically, they examined potential race differences in AUCs for the SPAN and PCL within three age strata and found significant race differences for the PCL in the youngest (≤ 49 yr) cohort, but not in the older groups (50-64 yr or 65 yr +). For those Veterans younger than 50, the PCL was a much better discriminator of PTSD among white Veterans (AUC=0.99), than among African American Veterans (AUC=0.81).

Although PTSD is associated with significant psychiatric comorbidity, there were no studies that examined whether the utility of a screening tool was affected by the presence of other mental health disorders.


There is very limited evidence regarding potential variation in the performance of screening tools by age, gender or race, and no information about how specific psychiatric comorbidities might affect the performance of the screening tools. Of the studies that were adequately powered to determine whether screen utility varied by demographic or clinical factors, only one was of high quality. Given this, our findings must be considered provisional.

There is weak evidence that the clinical utility of the PC-PTSD differs depending on patients' sex, and no information if it functions equally well among patients of different ages or racial, ethnic or socioeconomic backgrounds. For the Breslau scale the findings are similar, but were reported in only one study of limited quality. There is weak evidence that the performance of the SPAN does not vary by patient gender, age or race. The PCL appears to function comparably for men as it does for women, but there is weak evidence that Veterans who are younger than 50 and African American may not be identified as having PTSD as accurately by the PCL than Veterans who are white and/or older.

There was no information regarding the impact of psychiatric comorbidity on the performance characteristics of any of the screening tools.