NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Committee on Psychological Testing, Including Validity Testing, for Social Security Administration Disability Determinations; Board on the Health of Select Populations; Institute of Medicine. Psychological Testing in the Service of Disability Determination. Washington (DC): National Academies Press (US); 2015 Jun 29.

Cover of Psychological Testing in the Service of Disability Determination

Psychological Testing in the Service of Disability Determination.

Show details

5Cognitive Tests and Performance Validity Tests

Disability determination is based in part on signs and symptoms of a disease, illness, or impairment. When physical symptoms are the presenting complaint, identification of signs and symptoms of illnesses are relatively concrete and easily obtained through a general medical exam. However, documentation or concrete evidence of cognitive or functional impairments, as may be claimed by many applying for disability,1 is more difficult to obtain.

Psychological testing may help inform the evaluation of an individual's functional capacity, particularly within the domain of cognitive functioning. The term cognitive functioning encompasses a variety of skills and abilities, including intellectual capacity, attention and concentration, processing speed, language and communication, visual-spatial abilities, and memory. Sensorimotor and psychomotor functioning are often measured alongside neurocognitive functioning in order to clarify the brain basis of certain cognitive impairments, and are therefore considered as one of the domains that may be included within a neuropsychological or neurocognitive evaluation. These skills and abilities cannot be evaluated in any detail without formal standardized psychometric assessment.

This chapter examines cognitive testing, which relies on measures of task performance to assess cognitive functioning and establish the severity of cognitive impairments. As discussed in detail in Chapter 2, a determination of disability requires both a medically determinable impairment and evidence of functional limitations that affect an individual's ability to work. A medically determinable impairment must be substantiated by symptoms, signs, and laboratory findings (the so-called Paragraph A criteria) and the degree of functional limitations imposed by the impairment must be assessed in four broad areas: activities of daily living; social functioning; concentration, persistence, or pace; and episodes of decompensation (the so-called Paragraph B criteria). However, as discussed in Chapter 4, the U.S. Social Security Administration (SSA) is in the process of altering the functional domains, through a Notice of Proposed Rulemaking published in 2010.2 The proposed functional domains—understand, remember, and apply information; interact with others; concentrate, persist, and maintain pace; and manage oneself—increase focus on the relation of functioning to the work setting; because of SSA's move in this direction, the committee examines the relevance of psychological testing in terms of these proposed functional domains. As will be discussed below, cognitive testing may prove beneficial to the assessment of each of these requirements.


In contrast to testing that relies on self-report, as outlined in the preceding chapter, evaluating cognitive functioning relies on measures of task performance to establish the severity of cognitive impairments. Such tests are commonly used in clinical neuropsychological evaluations in which the goal is to identify a patient's pattern of strengths and weaknesses across a variety of cognitive domains. These performance-based measures are standardized instruments with population-based normative data that allow the examiner to compare an individual's performance with an appropriate comparison group (e.g., those of the same age group, sex, education level, and/or race/ethnicity).

Cognitive testing is the primary way to establish severity of cognitive impairment and is therefore a necessary component in a neuropsychological assessment. Clinical interviews alone are not sufficient to establish the severity of cognitive impairments, for two reasons: (1) patients are known to be poor reporters of their own cognitive functioning (Edmonds et al., 2014; Farias et al., 2005; Moritz et al., 2004; Schacter, 1990) and (2) clinicians relying solely on clinical interviews in the absence of neuropsychological test results are known to be poor judges of patients' cognitive functioning (Moritz et al., 2004). There is a long history of neuropsychological research linking specific cognitive impairments with specific brain lesion locations, and before the advent of neuroimaging, neuropsychological evaluation was the primary way to localize brain lesions; even today, neuropsychological evaluation is critical for identifying brain-related impairments that neuroimaging cannot identify (Lezak et al., 2012). In the context of the SSA disability determination process, cognitive testing for claimants alleging cognitive impairments could be helpful in establishing a medically determinable impairment, functional limitations, and/or residual functional capacity.

The use of standardized psychological and neuropsychological measures to assess residual cognitive functioning in individuals applying for disability will increase the credibility, reliability, and validity of determinations on the basis of these claims. A typical psychological or neuropsychological evaluation is multifaceted and may include cognitive and non-cognitive assessment tools. Evaluations typically consist of a (1) clinical interview, (2) administration of standardized cognitive or non-cognitive psychological tests, and (3) professional time for interpretation and integration of data. Some neuropsychological tests are computer administered, but the majority of tests in use today are paper-and-pencil tests.

The length of an evaluation will vary depending on the purpose of the evaluation and, more specifically, the type or degree of psychological and/or cognitive impairments that need to be evaluated. A national professional survey of 1,658 neuropsychologists from the membership of American Academy of Clinical Neuropsychology (AACN), Division 40 of American Psychological Association (APA), and the National Academy of Neuropsychologists (NAN) indicated that a typical neuropsychological evaluation takes approximately 6 hours, with a range from 0.5 to 25 hours (Sweet et al., 2011). The survey also identified a number of reasons for why the duration of an evaluation varies, including reason for referral, the type or degree of psychological and/ or cognitive impairments, or factors specific to the individual.

The most important aspect of administration of cognitive and neuropsychological tests is selection of the appropriate tests to be administered. That is, selection of measures is dependent on examination of the normative data collected with each measure and consideration of the population on which the test was normed. Normative data are typically gathered on generally healthy individuals who are free from significant cognitive impairments, developmental disorders, or neurological illnesses that could compromise cognitive skills. Data are generally gathered on samples that reflect the broad demographic characteristics of the United States including factors such as age, gender, and educational status. There are some measures that also provide specific comparison data on the basis of race and ethnicity.

As discussed in detail in Chapter 3, as part of the development of any psychometrically sound measure, explicit methods and procedures by which tasks should be administered are determined and clearly spelled out. All examiners use such methods and procedures during the process of collecting the normative data, and such procedures normally should be used in any other administration. Typical standardized administration procedures or expectations include (1) a quiet, relatively distraction-free environment; (2) precise reading of scripted instructions; and (3) provision of necessary tools or stimuli. Use of standardized administration procedures enables application of normative data to the individual being evaluated (Lezak et al., 2012). Without standardized administration, the individual's performance may not accurately reflect his or her ability. An individual's abilities may be overestimated if the examiner provides additional information or guidance than what is outlined in the test administration manual. Conversely, a claimant's abilities may be underestimated if appropriate instructions, examples, or prompts are not presented.

Cognitive Testing in Disability Evaluation

To receive benefits, claimants must have a medically determinable physical or mental impairment, which SSA defines as

an impairment that results from anatomical, physiological, or psychological abnormalities which can be shown by medically acceptable clinical and laboratory diagnostic techniques … [and] must be established by medical evidence consisting of signs, symptoms, and laboratory findings—not only by the individual's statement of symptoms. (SSA, n.d.-b)

To qualify at Step 3 in the disability evaluation process (as discussed in Chapter 2), there must be medical evidence that substantiates the existence of an impairment and associated functional limitations that meet or equal the medical criteria codified in SSA's Listings of Impairments. If an adult applicant's impairments do not meet or equal the medical listing, residual functional capacity—the most a claimant can still do despite his or her limitations—is assessed; this includes whether the applicant has the capacity for past work (Step 4) or any work in the national economy (Step 5). For child applicants, once there has been identification of a medical impairment, documentation of a “marked and severe functional limitation relative to typically developing peers” is required. Cognitive testing is valuable in both child and adult assessments in determining the existence of a medically determinable impairment and evaluating associated functional impairments and residual functional capacity.

Cognitive impairments may be the result of intrinsic factors (e.g., neurodevelopmental disorders, genetic factors) or be acquired through injury or illness (e.g., traumatic brain injury, stroke, neurological conditions) and may occur at any stage of life. Functional limitations in cognitive domains may also result from other mental or physical disorders, such as bipolar disorder, depression, schizophrenia, psychosis, or multiple sclerosis (Etkin et al., 2013; Rao, 1986).

Cognitive Domains Relevant to SSA

SSA currently assesses mental residual functional capacity by evaluating 20 abilities in four general areas: understanding and memory, sustained concentration and persistence, social interaction, and adaptation (see Form SSA-4734-F4-SUP: Mental Residual Functional Capacity [MRFC] Assessment). Through this assessment, a claimant's ability to sustain activities that require such abilities over a normal workday or workweek is determined.

In 2009, SSA's Occupational Information Development Advisory Panel (OIDAP) created its Mental Cognitive Subcommittee “to review mental abilities that can be impaired by illness or injury, and thereby impede a person's ability to do work” (OIDAP, 2009, p. C-3). In their report, the subcommittee recommended that the conceptual model of psychological abilities required for work, as currently used by SSA through the MRFC assessment, be revised to redress shortcomings and be based on scientific evidence. The subcommittee identified four major categories of psychological functioning essential to work: neurocognitive functioning, initiative and persistence, interpersonal functioning, and self-management, recommending that “SSA adopt 15 abilities that represent specific aspects of the[se] four general categories.” Within neurocognitive functioning, the testing of which is the primary focus of the current chapter, the subcommittee identified six relevant domains: general cognitive/intellectual ability, language and communication, memory acquisition, attention and distractibility, processing speed, and executive functioning; “each of the constituent abilities has been found to predict either the ability to work or level of occupational attainment among persons with various mental disorders and/or healthy adults” (OIDAP, 2009, p. C-22). Building on the subcommittee's report, the current Institute of Medicine (IOM) committee has adopted these six domains of cognitive functioning for its examination of cognitive testing in disability determinations.

Each of these functional domains would also be relevant areas of assessment in children applying for disability support. As indicated below, there are standardized measures that have been well normed and validated for pediatric populations. Interpretation of test results in children is more challenging, as it must take into account the likelihood of developmental progress and response to any interventions. Thus, the permanency of cognitive impairments identified in childhood is more difficult to ascertain in a single evaluation.

There are numerous performance-based tests that can be used to assess an individual's level of functioning within each domain identified below for both adults and children. It was beyond the scope of this committee and report to identify and describe each available standardized measure; thus, only a few commonly used tests are provided as examples for each domain. The choice of examples should not be seen as an attempt by the committee to identify or prescribe tests that should be used to assess these domains within the context of disability determinations. Rather, the committee believed that it was more appropriate to identify the most relevant domains of cognitive functioning and that it remains in the purview of the appropriately qualified psychological/neuropsychological evaluator to select the most appropriate measure for use in specific evaluations. For a more comprehensive list and review of cognitive tests, readers are referred to the comprehensive textbooks, Neuropsychological Assessment (Lezak et al., 2012) or A Compendium of Neuropsychological Tests (Strauss et al., 2006).

General Cognitive/Intellectual Ability

General cognitive/intellectual ability encompasses reasoning, problem solving, and meeting cognitive demands of varying complexity. It has been identified as “the most robust predictor of occupational attainment, and corresponds more closely to job complexity than any other ability” (OIDAP, 2009, p. C-21). Intellectual disability affects functioning in three domains: conceptual (e.g., memory, language, reading, writing, math, knowledge acquisition); social (e.g., empathy, social judgment, interpersonal skills, friendship abilities); and practical (e.g., self-management in areas such as personal care, job responsibilities, money management, recreation, organizing school and work tasks) (American Psychiatric Association, 2013, p. 37). Tests of cognitive/intellectual functioning, commonly referred to as intelligence tests, are widely accepted and used in a variety of fields, including education and neuropsychology. Prominent examples include the Wechsler Adult Intelligence Scale, fourth edition (WAIS-IV; Wechsler, 2008) and the Wechsler Intelligence Scale for Children, fourth edition (WISC-IV; Wechsler, 2003).

Language and Communication

The domain of language and communication focuses on receptive and expressive language abilities, including the ability to understand spoken or written language, communicate thoughts, and follow directions (American Psychiatric Association, 2013; OIDAP, 2009). The International Classification of Functioning, Disability and Health (WHO, 2001) distinguishes the two, describing language in terms of mental functioning while describing communication in terms of activities (the execution of tasks) and participation (involvement in a life situation). The mental functions of language include reception of language (i.e., decoding messages to obtain their meaning), expression of language (i.e., production of meaningful messages), and integrative language functions (i.e., organization of semantic and symbolic meaning, grammatical structure, and ideas for the production of messages). Abilities related to communication include receiving and producing messages (spoken, nonverbal, written, or formal sign language), carrying on a conversation (starting, sustaining, and ending a conversation with one or many people) or discussion (starting, sustaining, and ending an examination of a matter, with arguments for or against, with one or more people), and use of communication devices and techniques (telecommunications devices, writing machines) (WHO, 2001). In a survey of historical governmental and scholarly data, Ruben (1999) found that communication disorders were generally associated with higher rates of unemployment, lower social class, and lower income.

A wide variety of tests are available to assess language abilities; some prominent examples include the Boston Naming Test (Kaplan et al., 2001), Controlled Oral Word Association (Benton et al., 1994a; Spreen and Strauss, 1991), the Boston Diagnostic Aphasia Examination (Goodglass and Kaplan, 1983), and for children, the Clinical Evaluation of Language Fundamentals-4 (Semel et al., 2003) or Comprehensive Assessment of Spoken Language (Carrow-Woolfolk, 1999). There are fewer formal measures of communication per se, although there are some educational measures that do assess an individual's ability to produce written language samples, for example, the Test of Written Language (Hammill and Larsen, 2009).

Learning and Memory

This domain refers to abilities to register and store new information (e.g., words, instructions, procedures) and retrieve information as needed (OIDAP, 2009; WHO, 2001). Functions of memory include “short-term and long-term memory; immediate, recent and remote memory; memory span; retrieval of memory; remembering; [and] functions used in recalling and learning” (WHO, 2001, p. 53). However, it is important to note that semantic, autobiographical, and implicit memory are generally preserved in all but the most severe forms of neurocognitive dysfunction (American Psychiatric Association, 2013; OIDAP, 2009). Impaired memory functioning can arise from a variety of internal or external factors, such as depression, stress, stroke, dementia, or traumatic brain injury (TBI), and may affect an individual's ability to sustain work, due to a lessened ability to learn and remember instructions or work-relevant material. Examples of tests for learning and memory deficits include the Wechsler Memory Scale (Wechsler, 2009), Wide Range Assessment of Memory and Learning (Sheslow and Adams, 2003), California Verbal Learning Test (Delis, 1994; Delis et al., 2000), Hopkins Verbal Learning Test-Revised (Benedict et al., 1998; Brandt and Benedict, 2001), Brief Visuospatial Memory Test-Revised (Benedict, 1997), and the Rey-Osterrieth Complex Figure Test (Rey, 1941).

Attention and Vigilance

Attention and vigilance refers to the ability to sustain focus of attention in an environment with ordinary distractions (OIDAP, 2009). Normal functioning in this domain includes the ability to sustain, shift, divide, and share attention (WHO, 2001). Persons with impairments in this domain may have difficulty attending to complex input, holding new information in mind, and performing mental calculations. They may also exhibit increased difficulty attending in the presence of multiple stimuli, be easily distracted by external stimuli, need more time than previously to complete normal tasks, and tend to be more error prone (American Psychiatric Association, 2013). Tests for deficits in attention and vigilance include a variety of continuous performance tests (e.g., Conners Continuous Performance Test, Test of Variables of Attention), the WAIS-IV working memory index, Digit Vigilance (Lewis, 1990), and the Paced Auditory Serial Addition Test (Gronwall, 1977).

Processing Speed

Processing speed refers to the amount of time it takes to respond to questions and process information, and “has been found to account for variability in how well people perform many everyday activities, including untimed tasks” (OIDAP, 2009, p. C-23). This domain reflects mental efficiency and is central to many cognitive functions (NIH, n.d.). Tests for deficits in processing speed include the WAIS-IV processing speed index and the Trail Making Test Part A (Reitan, 1992).

Executive Functioning

Executive functioning is generally used as an overarching term encompassing many complex cognitive processes such as planning, prioritizing, organizing, decision making, task switching, responding to feedback and error correction, overriding habits and inhibition, and mental flexibility (American Psychiatric Association, 2013; Elliott, 2003; OIDAP, 2009). It has been described as “a product of the coordinated operation of various processes to accomplish a particular goal in a flexible manner” (Funahashi, 2001, p. 147). Impairments in executive functioning can lead to disjointed and disinhibited behavior; impaired judgment, organization, planning, and decision making; and difficulty focusing on more than one task at a time (Elliott, 2003). Patients with such impairments will often have difficulty completing complex, multistage projects or resuming a task that has been interrupted (American Psychiatric Association, 2013). Because executive functioning refers to a variety of processes, it is difficult or impossible to assess executive functioning with a single measure. However, it is an important domain to consider, given the impact that impaired executive functioning can have on an individual's ability to work (OIDAP, 2009). Some tests that may assist in assessing executive functioning include the Trail Making Test Part B (Reitan, 1992), the Wisconsin Card Sorting Test (Heaton, 1993), and the Delis-Kaplan Executive Function System (Delis et al., 2001).


Once a test has been administered, assuming it has been done so according to standardized protocol, the test-taker's performance can be scored. In most instances, an individual's raw score, that is the number of items on which he or she responded correctly, is translated into a standard score based on the normative data for the specific measure. In this manner, an individual's performance can be characterized by its position on the distribution curve of normal performances.

The majority of cognitive tests have normative data from groups of people who mirror the broad demographic characteristics of the population of the United States based on census data. As a result, the normative data for most measures reflect the racial, ethnic, socioeconomic, and educational attainment of the population majorities. Unfortunately, that means that there are some individuals for whom these normative data are not clearly and specifically applicable. This does not mean that testing should not be done with these individuals, but rather that careful consideration of normative limitations should be made in interpretation of results.

Selection of appropriate measures and assessment of applicability of normative data vary depending on the purpose of the evaluation. Cognitive tests can be used to identify acquired or developmental cognitive impairment, to determine the level of functioning of an individual relative to typically functioning same-aged peers, or to assess an individual's functional capacity for everyday tasks (Freedman and Manly, 2015). Clearly, each of these purposes could be relevant for SSA disability determinations. However, each of these instances requires different interpretation and application of normative data.

When attempting to identify a change in functioning secondary to neurological injury or illness, it is most appropriate to compare an individual's postinjury performance to his or her premorbid level of functioning. Unfortunately, it is rare that an individual has a formal assessment of his or her premorbid cognitive functioning. Thus, comparison of the postinjury performance to demographically matched normative data provides the best comparison to assess a change in functioning (Freedman and Manly, 2015; Heaton et al., 2001; Manly and Echemendia, 2007). For example, assessment of a change in language functioning in a Spanish-speaking individual from Mexico who has sustained a stroke will be more accurate if the individual's performance is compared to norms collected from other Spanish-speaking individuals from Mexico rather than English speakers from the United States or even Spanish-speaking individuals from Puerto Rico. In many instances, this type of data is provided in alternative normative data sets rather than the published population-based norms provided by the test publisher.

In contrast, the population-based norms are more appropriate when the purpose of the evaluation is to describe an individual's level of functioning relative to same-aged peers (Busch, 2006; Freedman and Manly, 2015). A typical example of this would be in instances when the purpose of the evaluation is to determine an individual's overall level of intellectual (i.e., IQ) or even academic functioning. In this situation, it is more relevant to compare that individual's performance to that of the broader population in which he or she is expected to function in order to quantify his or her functional capabilities. Thus, for determination of functional disability, demographically or ethnically corrected normative data are inappropriate and may actually underestimate an individual's degree of disability (Freedman and Manly, 2015). In this situation, use of otherwise appropriate standardized and psychometrically sound performance-based or cognitive tests is appropriate.

Determination of an individual's everyday functioning or vocational capacity is perhaps the evaluation goal most relevant to the SSA disability determination process. To make this determination, the most appropriate comparison group for any individual would be other individuals who are currently completing the expected vocational tasks without limitations or disability (Freedman and Manly, 2015). Unfortunately, there are few standardized measures of skills necessary to complete specific vocational tasks and, therefore, also no vocational-specific normative data at this time. This type of functional capacity is best measured by evaluation techniques that recreate specific vocational settings and monitor an individual's completion of related tasks.

Until such specific vocational functioning measures exist and are readily available for use in disability determinations, objective assessment of cognitive skills that are presumed to underlie specific functions will be necessary to quantify an individual's functional limitations. Despite limitations in normative data as outlined in Freedman and Manly (2015), formal psychometric assessment can be completed with individuals of various ethnic, racial, gender, educational, and functional backgrounds. However, the authors note that “limited research suggests that demographic adjustments reduce the power of cognitive test scores to predict every-day abilities” (e.g., Barrash et al., 2010; Higginson et al., 2013; Silverberg and Millis, 2009). In fact, they go on to state “the normative standard for daily functioning should not include adjustments for age, education, sex, ethnicity, or other demographic variables” (p. 9). Use of appropriate standardized measures by appropriately qualified evaluators as outlined in the following sections further mitigates the impact of normative limitations.


Interpretation of results is more than simply reporting the raw scores an individual achieves. Interpretation requires assigning some meaning to the standardized score within the individual context of the specific test-taker. There are several methods or levels of interpretation that can be used, and a combination of all is necessary to fully consider and understand the results of any evaluation (Lezak et al., 2012). This section is meant to provide a brief overview; although a full discussion of all approaches and nuances of interpretation is beyond the scope of this report, interested readers are referred to various textbooks (e.g., Groth-Marnat, 2009; Lezak et al, 2012).

Interindividual Differences

The most basic level of interpretation is simply to compare an individual's testing results with the normative data collected in the development of the measures administered. This level of interpretation allows the examiner to determine how typical or atypical an individual's performance is in comparison to same-aged individuals within the general population. Normative data may or may not be further specialized on the basis of race/ ethnicity, gender, and educational status. There is some degree of variability in how an individual's score may be interpreted based on its deviation from the normative mean due to various schools of thought, all of which cannot be described in this text. One example of an interpretative approach would be that a performance within one standard deviation of the mean would be considered broadly average. Performances one to two standard deviations below the mean are considered mildly impaired, and those two or more standard deviations below the mean typically are interpreted as being at least moderately impaired.

Intraindividual Differences

In addition to comparing an individual's performances to that of the normative group, it also is important to compare an individual's pattern of performances across measures. This type of comparison allows for identification of a pattern of strengths and weaknesses. For example, an individual's level of intellectual functioning can be considered a benchmark to which functioning within some other domains can be compared. If all performances fall within the mildly to moderately impaired range, an interpretation of some degree of intellectual disability may be appropriate, depending on an individual's level of adaptive functioning. It is important to note that any interpretation of an individual's performance on a battery of tests must take into account that variability in performance across tasks is a normal occurrence (Binder et al., 2009) especially as the number of tests administered increases (Schretlen et al., 2008). However, if there is significant variability in performances across domains, then a specific pattern of impairment may be indicated.

Profile Analysis

When significant variability in performances across functional domains is assessed, it is necessary to consider whether or not the pattern of functioning is consistent with a known cognitive profile. That is, does the individual demonstrate a pattern of impairment that makes sense or can be reliably explained by a known neurobehavioral syndrome or neurological disorder. For example, an adult who has sustained isolated injury to the temporal lobe of the left hemisphere would be expected to demonstrate some degree of impairment on some measures of language and verbal memory, but to demonstrate relatively intact performances on measures of visual-spatial skills. This pattern of performance reflects a cognitive profile consistent with a known neurological injury. Conversely, a claimant who demonstrates impairment on all measures after sustaining a brief concussion would be demonstrating a profile of impairment that is inconsistent with research data indicating full cognitive recovery within days in most individuals who have sustained a concussion (McCrea et al., 2002, 2003).

Interpreting Poor Cognitive Test Performance

Regardless of the level of interpretation, it is important for any evaluator to keep in mind that poor performance on a set of cognitive or neuropsychological measures does not always mean that an individual is truly impaired in that area of functioning. Additionally, poor performance on a set of cognitive or neuropsychological measures does not directly equate to functional disability.

In instances of inconsistent or unexpected profiles of performance, a thorough interpretation of the psychometric data requires use of additional information. The evaluator must consider the validity and reliability of the data acquired, such as whether or not there were errors in administration that rendered the data invalid, emotional or psychiatric factors that affected the individual's performance, or sufficient effort put forth by the individual on all measures.

To answer the latter question, administration of performance validity tests (PVTs) as part of the cognitive or neuropsychological evaluation battery can be helpful. Interpretation of PVT data must be undertaken carefully. Any PVT result can only be interpreted in an individual's personal context, including psychological/emotional history, level of intellectual functioning, and other factors that may affect performance. Particular attention must be paid to the limitations of the normative data available for each PVT to date. As such, a simple interindividual interpretation of PVT testing results is not acceptable or valid. Rather, consideration of intraindividual patterns of performance on various cognitive measures is an essential component of PVT interpretation. PVTs will be discussed in greater detail later in this chapter.

Qualifications for Administering Tests

Given the need for the use of standardized procedures, any person administering cognitive or neuropsychological measures must be well trained in standardized administration protocols. He or she should possess the interpersonal skills necessary to build rapport with the individual being tested in order to foster cooperation and maximal effort during testing. Additionally, individuals administering testing should understand important psychometric properties, including validity and reliability, as well as factors that could emerge during testing to place either at risk (as described in Chapter 3).

Many doctoral-level psychologists are well trained in test administration. In general, psychologists from clinical, counseling, school, or educational graduate psychology programs receive training in psychological test administration. However, the functional domains of emphasis in most of these programs include intellectual functioning, academic achievement, aptitude, emotional functioning, and behavioral functioning (APA, 2015). Thus, if the request for disability is based on a claim of intellectual disability or significant emotional/behavioral dysfunction, a psychologist with solid psychometric training from any of these types of graduate-level training programs would typically be capable of completing the necessary evaluation.

For cases in which the claim is based on specific cognitive deficits, particularly those attributed to neurological disease or injury, a neuropsychologist may be needed to most accurately evaluate the claimant's functioning. Neuropsychologists are clinical psychologists

trained in the science of brain-behavior relationships. The clinical neuropsychologist specializes in the application of assessment and intervention principles based on the scientific study of human behavior across the lifespan as it relates to normal and abnormal functioning of the central nervous system. (HNS, 2003)

That is, a neuropsychologist is trained to evaluate functioning within specific cognitive domains that may be affected or altered by injury to or disease of the brain or central nervous system. For example, a claimant applying for disability due to enduring attention or memory dysfunction secondary to a TBI would be most appropriately evaluated by a neuropsychologist.

The use of psychometrists or technicians in cognitive/neuropsychological test administration is a widely accepted standard of practice (Brandt and van Gorp, 1999). Psychometrists are often bachelor's- or master's-level individuals who have received additional specialized training in standardized test administration and test scoring. They do not practice independently, but rather work under the close supervision and direction of doctoral-level clinical psychologists.

Qualifications for Interpreting Test Results

Interpretation of testing results requires a higher degree of clinical training than administration alone. Most doctoral-level clinical psychologists who have been trained in psychometric test administration are also trained in test interpretation. As stated in the existing SSA (n.d.-a) documentation regarding evaluation of intellectual disability, the specialist completing psychological testing “must be currently licensed or certified in the state to administer, score, and interpret psychological tests and have the training and experience to perform the test.” However, as mentioned above, the training received by most clinical psychologists is limited to certain domains of functioning, including measures of general intellectual functioning, academic achievement, aptitude, and psychological/emotional functioning. Again, if the request for disability is based on a claim of intellectual disability or significant emotional/behavioral dysfunction, a psychologist with solid psychometric training from any of these programs should be capable of providing appropriate interpretation of the testing that was completed. The reason for the evaluation, or more specifically, the type of claim of impairment, may suggest a need for a specific type of qualification of the individual performing and especially interpreting the evaluation.

As stated in existing SSA (n.d.-a) documentation, individuals who administer more specific cognitive or neuropsychological evaluations “must be properly trained in this area of neuroscience.” Clinical neuropsychologists, as defined above, are individuals who have been specifically trained to interpret testing results within the framework of brain-behavior relationships and who have achieved certain educational and training benchmarks as delineated by national professional organizations (AACN, 2007; NAN, 2001). More specifically, clinical neuropsychologists have been trained to interpret more complex and comprehensive cognitive or neuropsychological batteries that could include assessment of specific cognitive functions, such as attention, processing speed, executive functioning, language, visual-spatial skills, or memory. As stated above, interpretation of data involves examining patterns of individual cognitive strengths and weaknesses within the context of the individual's history including specific neurological injury or disease (i.e., claims on the basis of TBI).


Neuropsychological tests assessing cognitive, motor, sensory, or behavioral abilities require actual performance of tasks, and they provide quantitative assessments of an individual's functioning within and across cognitive domains. The standardization of neuropsychological tests allows for comparability across test administrations. However, interpretation of an individual's performance presumes that the individual has put forth full and sustained effort while completing the tests; that is, accurate interpretation of neuropsychological performance can only proceed when the test-taker puts forth his or her best effort on the testing. If a test-taker is not able to give his or her best effort, for whatever reason, the test results cannot be interpreted as accurately reflecting the test-taker's ability level. As discussed in detail in Chapter 2, a number of studies have examined potential for malingering when there is a financial incentive for appearing impaired, suggesting anywhere from 19 to 68 percent of SSA disability applicants may be performing below their capability on cognitive tests or inaccurately reporting their symptoms (Chafetz, 2008; Chafetz et al., 2007; Griffin et al., 1996; Mittenberg et al., 2002). For a summary of reported base rates of “malingering,” see Table 2-2 of this report and the ensuing discussion. However, an individual may put forth less than optimal effort due to a variety of factors other than malingering, such as pain, fatigue, medication use, and psychiatric symptomatology (Lezak et al., 2012).

For these reasons, analysis of the entire cognitive profile for consistency is generally recommended. Specific patterns that increase confidence in the validity of a test battery and overall assessment include

  • Consistency between test behavior or self-reported symptoms and incidental behavior;
  • Consistency between test behavior or self-reported symptoms and what is known about brain functioning and the type and severity of injury/illness claimed;
  • Consistency between test behavior or self-reported symptoms and known patterns of performance (e.g., passing easy items and failing more difficult items; better performance on cued recall and recognition tests than free recall tests; intact memory requires intact attention);
  • Consistency between test behavior or self-reported symptoms and reliable collateral reports or other background information, such as medical documentation;
  • Consistency between self-reported history and reliable collateral history or medical documentation; and
  • Consistency across tests measuring the same cognitive domain or across tests administered at different times.

Specific tests have also been designed especially to aid in the examination of performance validity. The development of and research on these PVTs has increased rapidly during the past two decades. There have been attempts to formally quantify performance validity during testing since the mid-1900s (Rey, 1964), with much of the initial focus on examining the consistency of an individual's responses across a battery of testing, with the suggestion that inconsistency may indicate variable effort. However, a significant push for specific formal measures came in response to the increased use of neuropsychological and cognitive testing in forensic contexts, including personal injury litigation, workers compensation, and criminal proceedings in the 1980s and 1990s (Bianchini et al., 2001; Larrabee, 2012a). Given the nature of these evaluations, there was often a clear incentive for an individual to exaggerate his or her impairment or to put forth less than optimal effort during testing, and neuropsychologists were being called upon to provide statements related to the validity of test results (Slick et al., 1999). Several studies documented that use of clinical judgment and interpretation of performance inconsistencies alone was an inadequate methodology for detection of poor effort or intentionally poor performance (Faust et al., 1988; Heaton et al., 1978; van Gorp et al., 1999). As such, the need for formal standardized measures of effort and means for interpretation of these measures emerged.

PVTs are measures that assess the extent to which an individual is providing valid responses during cognitive or neuropsychological testing. PVTs are typically simple tasks that are easier than they appear to be and on which an almost perfect performance is expected based on the fact that even individuals with severe brain injury have been found capable of good performance (Larrabee, 2012b). On the basis of that expectation, each measure has a performance cut-off defined by an acceptable number of errors designed to keep the false-positive rate low. Performances below these cutoff points are interpreted as demonstrating invalid test performance.

Types of PVTs

PVTs may be designed as such and embedded within other cognitive tests, later derived from standard cognitive tests, or designed as stand-alone measures. Examples of each type of measure are discussed below.

Embedded and Derived Measures

Embedded and derived PVTs are similar in that a specific score or assessment of response bias is determined from an individual's performance on an aspect of a preexisting standard cognitive measure. The primary difference is that embedded measures consist of indices specifically created to assess validity of performance in a cognitive test, whereas derived measures typically use novel calculations of performance discrepancies rather than simply examining the pattern of performance on already established indices. The rationale for this type of PVT is that it does not require administration of any additional tasks and therefore does not result in any added time or cost. Additionally, development of these types of PVTs can allow for retrospective consideration or examination of effort in batteries in which specific stand-alone measures of effort were not administered (Solomon et al., 2010).

The forced-choice condition of the California Verbal Learning Test—second edition (CVLT-II) (Delis et al., 2000) is an example of an embedded PVT. Following learning, recall, and recognition trials involving a 16-item word list, the test-taker is presented with pairs of words and asked to identify which one was on the list. More than 92 percent of the normative population, including individuals in their eighties, scored 100 percent on this test. Scores below the published cut-off are unusually low and indicative of potential noncredible performance. Scores below chance are considered to reflect purposeful noncredible performance, in that the test-taker knew the correct answer but purposely chose the wrong answer.

Reliable Digit Span, based on the Digit Span subtest of the Wechsler Adult Intelligence Scale, is an example of a measure that was derived based on research following test publication. The Digit Span subtest requires test-takers to repeat strings of digits in forward order (forward digit span), as well as in reverse order (backward digit span). To calculate Reliable Digit Span, the maximum forward and backward span are summed, and scores below the cut-off point are associated with noncredible performance (Greiffenstein et al., 1994). A full list of embedded and derived PVTs is provided in Table 5-1.

TABLE 5-1. Embedded and Derived PVTs.


Embedded and Derived PVTs.

Stand-Alone Measures

A stand-alone PVT is a measure that was developed specifically to assess a test-taker's effort or consistency of responses. That is, although the measure may appear to assess some other cognitive function (e.g., memory), it was actually developed to be so simple that even an individual with severe impairments in that function would be able to perform adequately. Such measures may be forced choice or non-forced choice (Boone and Lu, 2007; Grote and Hook, 2007).

The Test of Memory Malingering (TOMM) (Tombaugh and Tombaugh, 1996), the Word Memory Test (WMT) (Green et al., 1996), and the Rey Memory for Fifteen Items Test (RMFIT) (Rey, 1941) are examples of standalone measures of performance validity. As with many stand-alone measures, the TOMM, WMT, and RMFIT are memory tests that appear more difficult than they really are. The TOMM and WMT use a forced-choice method to identify noncredible performance in which the test-taker is asked to identify which of two stimuli was previously presented. Accuracy scores are compared to chance level performance (i.e., 50 percent correct), as well as performance by normative groups of head-injured and cognitively impaired individuals, with cut-offs set to minimize false-positive errors. Alternatively, the RMFIT uses a non-forced-choice method in which the test-taker is presented with a group of items and then asked to reproduce as many of the items as possible.

Forced-Choice PVTs

As noted above, some PVTs are forced-choice measures on which performance significantly below chance has been suggested to be evidence of intentionally poor performance based on application of the binomial theorem (Larrabee, 2012a). For example, if there are two choices, it would be expected that purely random guessing would result in 50 percent of items correct. Scores deviating from 50 percent in either direction indicate nonchance-level performance. The most probable explanation for substantially below-chance PVT scores is that the test-taker knew the correct answer but purposely selected the wrong answer. The Slick and colleagues (1999) criteria for malingered neurocognitive dysfunction include below chance performance (P < 0.05) on one or more forced-choice measures of performance validity as indicative of malingering, and the authors state that “short of confession,” below-chance performance on performance validity testing is “closest to an evidentiary ‘gold standard’ for malingering.” Though below-chance performance on forced-choice PVTs implies intent, the committee believes it does not necessarily imply malingering, because the motivation of the performance may not be known; however, it does mean that the remainder of the test battery cannot be interpreted. A list of forced-choice PVTs can be found in Table 5-2.

TABLE 5-2. Forced-Choice PVTs.


Forced-Choice PVTs.

Administration and Interpretation of PVTs

It is within that historical medicolegal context that clinical practice guidelines for neuropsychology emerged to emphasize the use of psychometric indicators of response validity (as opposed to clinician judgment alone) in determining the interpretability of a battery of cognitive tests (Bianchini et al., 2001; Heilbronner et al., 2009). Moreover, it has become standard clinical practice to use multiple PVTs throughout an evaluation (Boone, 2009; Heilbronner et al., 2009). In general, multiple PVTs should be administered over the course of the evaluation because performance validity may wax and wane with increasing and decreasing fatigue, pain, motivation, or other factors that can influence effortful performance (Boone, 2009, 2014; Heilbronner et al., 2009). Some of the PVT development studies have attempted to examine these factors (i.e., effect of experimentally induced pain) and found no effect on PVT performance (Etherton et al., 2005a,b).

In clinical evaluations, most individuals will pass PVTs, and a small proportion will fail at the below-chance level. These clear passes can support the examiner's interpretation of the evaluation data being valid. Clear failures, that is below-chance performances, certainly place the validity of any other data obtained in the evaluation in question.

The risk of falsely identifying failure on one PVT as indicative of noncredible performance has resulted in the common practice of requiring failure on at least two PVTs to make any assumptions related to effort (Boone, 2009, 2014; Larrabee, 2014a). According to practice guidelines of NAN, performance slightly below the cut-off point on only one PVT cannot be construed to represent noncredible performance or biased responding; converging evidence from other indicators is needed to make a conclusion regarding performance bias (Bush et al., 2005). Similarly, AACN suggests the use of multiple validity assessments, both embedded and stand-alone, when possible, noting that effort may vary during an evaluation (Heilbronner et al., 2009). However, it should be noted that in cases where a test-taker scores significantly below chance on a single forced-choice PVT, intent to deceive may be assumed and test scores deemed invalid. It is also important to note that some situations may preclude the use of multiple validity indicators. For example, when evaluating an early school-aged child, at present, the TOMM is the only empirically established PVT (Kirkwood, 2014). In such situations, “it is the clinician's responsibility to document the reasons and explicitly note the interpretive implications” of reliance on a single PVT (Heilbronner et al., 2009).

The number of noncredible performances and the pattern of PVT failure are both considered in making a determination about whether the remainder of the neuropsychological battery can be interpreted. This consideration is particularly important in evaluations in which the test-taker's performance on cognitive measures falls below an expected level, suggesting potential cognitive impairment. That is, an individual's poor performance on cognitive measures may reflect insufficient effort to perform well, as suggested by PVT performance, rather than a true impairment. However, even in the context of PVT failure, performances that are in the average range can be interpreted as reflecting ability that is in the average range or above, though such performances may represent an underestimate of actual level of ability. Certainly, PVT “failure” does not equate to malingering or lack of disability. However, clear PVT failures make the validity of the remainder of the cognitive battery questionable; therefore, no definitive conclusions can be drawn regarding cognitive ability (aside from interpreting normal performances as reflecting normal cognitive ability). An individual who fails PVTs may still have other evidence of disability that can be considered in making a determination; in these cases, further information would be needed to establish the case for disability.

AACN and NAN endorse the use of PVT measures in the context of any neuropsychological examination (Bush et al., 2005; Heilbronner et al., 2009). The practice standards require clinical neuropsychologists performing evaluations of cognitive functioning for diagnostic purposes to include PVTs and comment on the validity of test findings in their reports. There is no gold standard PVT, and use of multiple PVTs is recommended. A specified set of PVTs, or other cognitive measures for that matter, is not recommended due to concerns regarding test security and test-taker coaching.3

Caveats and Considerations in the Use of PVTs

Given the primary use of cut-off scores, even within the context of forced-choice tasks, the interpretation of PVT performance is inherently different than interpretation of performance on other standardized measures of cognitive functioning owing to the nature of the scores obtained. Unlike general cognitive measures that typically use a norm-referenced scoring paradigm assuming a normal distribution of scores, PVTs typically use a criterion-referenced scoring paradigm because of a known skewed distribution of scores (Larrabee, 2014a). That is, an individual's performance is compared to a cut-off score set to keep false-positive rates below 10 percent for determining whether or not the individual passed or failed the task.

A resulting primary critique of PVTs is that the development of the criterion or cut-off scores has not been as rigorous or systematic as is typically expected in the collection of normative data during development of a new standardized measure of cognitive functioning. In general, determination of what is an acceptable or passing performance and associated cut-off scores have been established in somewhat of a post hoc or retrospective fashion. However, there are some embedded PVTs that have been co-normed with their “parent” tests, such as the forced-choice condition of the CVLT-II, which was normed along with the CVLT-II and thus has norms from the general population.

For most PVTs, however, rather than administering the measures to a large number of “typical” individuals of various ages, ethnicities, and even clinical diagnoses, researchers have examined the pattern of performance retrospectively in clinical samples that may have had some incentive to underperform (i.e., secondary gain), such as litigants (Roberson et al., 2013) or individuals presenting for consultative evaluations for Social Security disability determination (Chafetz, 2011; Chafetz and Underhill, 2013). An alternative methodology is to use simulation/nonsimulation samples in which one group of participants is told to perform poorly as if they had some type of impairment and the other is told to perform typically. Performances in these types of groups have then been used to establish cut-off scores via (1) identification of a fixed but arbitrary cut-off score of performance, or (2) identification of an “empirical floor” based on the lowest level of performance of a chosen clinical sample (the “known groups” approach, i.e., severely brain-injured patients) (Bianchini et al., 2001). One concern with this methodology is that data from simulators, especially data used to determine the sensitivity or specificity of a PVT, may not be applicable to real-world clinical samples (Boone et al., 2002a, 2005). In fact, few PVTs (other than some embedded PVTs such as CVLT-II Forced-Choice Recognition) have been normed on population-based samples or samples that are not biased in some way due to the method of recruitment (Freedman and Manly, 2015). Thus, the applicability or generalizability of cut-off scores to a broader (i.e., nonforensic) population is questionable.

As a result of this methodology, there are no true “traditional” normative data for many of these measures. However, the need for this type of normative data is minimal given the fact that the simple nature of tasks allows most patients with even severe brain injury, let alone “typical” individuals, to perform at near perfect levels (Larrabee, 2014a). Because of these skewed performance patterns, expectations for sensitivity and specificity for detection of poor performance have been developed rather than traditional norms (Greve and Bianchini, 2004).

Sensitivity in this context is defined as the degree to which a performance score on the measure will correctly identify an individual who is putting forth less than optimal effort. Specificity is the degree to which a performance score will correctly identify a person who is putting forth sufficient or optimal effort. Thus, to be most useful, ideally a PVT has high sensitivity and specificity. In general, however, most PVT cut-off scores are determined to have sensitivity within the 50–60 percent range and specificity within the 90–95 percent range. A meta-analysis of 47 studies by Sollman and Berry (2011) examined the sensitivity and specificity of five stand-alone forced-choice PVTs, finding a mean sensitivity of 69 percent and mean specificity of 90 percent. However, the individual sensitivities and specificities of the measures varied (e.g., WMT sensitivity ranged from 49 percent to 100 percent and specificity ranged from 25 percent to 96 percent; TOMM sensitivity ranged from 34 percent to 100 percent and specificity ranged from 69 percent to 100 percent). There is general agreement among neuropsychologists that PVT specificity must be at least 90 percent for a PVT to be acceptable, in order to avoid falsely labeling valid performances as noncredible (Boone, 2007).

Sensitivity and specificity levels have been “verified” in experimental studies that employ comparison between groups that were expected to or told to perform well and those that were expected to or told to perform poorly. That is, researchers compared the performance on PVTs of groups of people “known” or expected to be performing poorly (i.e., those with clear secondary gain, those instructed to feign poor performance, or those who meet Slick and colleagues [1999] criteria for malingering) to those who perform well on PVTs or without clear secondary gain. Otherwise, studies have simply examined the pass/fail rates in clinical samples and the correlations of PVT performance with performance on the broader neuropsychological battery. There has been some comparison between the overall performance of subgroups who failed PVTs with the performance of the subgroup that did not, with the suggestion that those who fail PVTs tend to perform more poorly on testing overall. Although this methodology may appear to be more appropriate to the clinical situation, it still does not provide any indication of why an individual failed a PVT, which could be due to lack of effort or a variety of other factors, including true cognitive impairment (Freedman and Manly, 2015).

Although many would argue that PVT failure caused by true cognitive impairment is rare, the fact that failure could occur for valid reasons means that interpretation of PVT performances is exceptionally critical and must be done very cautiously. There are insufficient data related to the base-rate of below-chance performances on PVTs in different populations (Freedman and Manly, 2015). As Bigler (2012, 2014, 2015) points out, there are many individuals whose performances fall within a grey area, meaning they perform below the identified cut-off level but above chance. For example, individuals with multiple sclerosis, schizophrenia, TBI, or epilepsy have PVT failure rates of 11–30 percent in terms of falling below standard cut-off scores, even in the absence of known secondary gain (Hampson et al., 2013; Stevens et al., 2014; Suchy et al., 2012). Davis and Millis (2014) identified increased rates of PVT failure in individuals with lower educational status and lower functional status (i.e., independence in activities of daily living). Alternatively, others contend that concerns about grey area performance are unfounded, as the risk for false positives can be minimized, For example, Boone (2009, 2014), Larrabee (2012, 2014a,b), and others assert that multiple PVT failures are generally required,4 and as the number of PVT failures increase, the chance for a false positive approaches zero. Yet, it is possible that PVT failures (i.e., below cut-off score performance) in certain populations reflect legitimate cognitive impairments. For this reason, it has also been recommended that close attention be paid to the pattern of PVT performance and the potential for false positives in these at-risk populations in order to inform interpretation and reduce the chances for false positives (Larrabee, 2014a,b) and to inform future PVT research (Boone, 2007; Larrabee, 2007).

For these reasons, it is necessary to evaluate PVTs in the context of the individual disability applicant, including interpretation of the degree of PVT failure (e.g., below-chance performance versus performance slightly below cut-off score performance) and the consistency of failure across PVTs. Furthermore, careful interpretation of grey area PVT performance (significantly above chance but below standard cut-offs) is necessary, given that a significant proportion of individuals with bona fide mental or cognitive disorders may score in this “grey area.” Adding to the complexity of interpreting these scores, population-based norms, and certainly norms for specific patient groups, are not available for most PVTs. Rather, owing to the process of development of these tasks, normative data exist only for select populations, typically litigants or those seeking compensation for injury. Thus, there are no norms for specific demographic groups (e.g., racial/ ethnic minority groups). It has been suggested that examiners can compensate for these normative issues by using their clinical judgment to identify an alternate cut-off score for increased specificity (which will come at a cost of lower sensitivity) (Boone, 2014). For example, if an examiner identifies cultural, ethnic, and/or language factors known to affect PVT scores, the examiner should adjust his or her thresholds for identifying noncredible performance (Salazar et al., 2007).

Despite the practice standard of using multiple PVTs, there may be an increased likelihood of abnormal performances as the number of measures administered increases, a pattern that occurs in the context of standard cognitive measures (Schretlen et al., 2008). This type of analysis is beginning to be applied to PVTs specifically with inconsistent findings to date. Several studies examining PVT performance patterns in groups of clinical patients have indicated that it is very unlikely that an individual putting forth good effort on testing will fail two or more PVTs regardless of type of PVT (i.e., embedded or free-standing) (Iverson and Franzen, 1996; Larrabee, 2003). In fact, Victor and colleagues (2009) found a significant difference in the rate of failure on two or more embedded PVTs between those determined to be credible responders (5 percent failure) and noncredible responders (37 percent failure) in a clinical referral sample. Davis and Millis (2014) also found no predictive relation between the number of PVTs administered and the rate of PVT failure in a retrospective review of 158 consecutive referrals for evaluation. In contrast, others have utilized statistical modeling techniques to argue that there is an increased rate of false-positive PVT failures with increased number of PVTs administered (Berthelson et al., 2013; Bilder et al., 2014). Thus, ongoing careful interpretation of failure patterns is warranted.

Clinical use and research on PVT use in pediatric samples to date is significantly limited compared to that in adults. As such, specific pediatric criteria to determine pass/fail performances on PVTs do not exist. However, in general, the conclusion has been that children, even down to age 5 years, typically are able to pass most stand-alone measures of effort even when compared to the adult-based cut-off scores (DeRight and Carone, 2015). Despite these greater limitations in normative data, use of PVTs is becoming common practice even in pediatric patient samples. As in adults, children's performance on PVTs has been correlated with intellectual abilities (Gast and Hart, 2010; MacAllister et al., 2009), although even those with mildly impaired cognitive abilities have been able to pass stand-alone measures (Green and Flaro, 2003). Additionally, in samples of consecutive clinical referrals, failure on PVTs has not been associated with demographic, developmental disorders, or neurological status (Kirkwood et al., 2012). Even children with documented moderate to severe brain injury/dysfunction have been found to pass PVTs at the expected adult level (Carone, 2008). There are currently no studies examining PVT use with children younger than age five; however, research has shown that deception strategies at this age generally cannot be sustained and are fairly basic and obvious. As such, behavioral observations are important to assessing validity of cognitive testing with preschool-aged children (DeRight and Carone, 2015; Kirkwood, 2014).


As suggested above, there are many applicants for whom administration of cognitive or neuropsychological testing would be beneficial to improve the standardization and credibility of determinations based on allegations of disability on the basis of cognitive impairment. The discussion below should not be considered all-inclusive, but rather as an attempt to highlight categories of disability applicants in which cognitive or performance-based testing would be appropriate.

Intellectual Disability

SSA has clear and appropriate standards for documentation for individuals applying for disability on the basis of intellectual disability (SSA, n.d.-a). As stated by SSA, “standardized intelligence test results are essential to the adjudication of all cases of intellectual disability” if the claimant does not clearly meet or equal the medical listing without. There are individual cases, of course, in which the claimant's level of impairment is so significant that it precludes formalized testing. For these individuals, their level of functioning and social history provides a longitudinal consistent record and documentation of impairment. For those who can complete intellectual testing and for whom their social history is inconsistent, inclusion of some documentation or assessment of effort may be warranted and would help to validate the results of intellectual and adaptive functioning assessment.

Use of PVTs is common among practitioners assessing for intellectual disability, with the TOMM being the most commonly used measure (Victor and Boone, 2007). However, caution is warranted in interpreting PVT results in individuals with intellectual disability, as IQ has consistently been correlated with PVT performance (Dean et al., 2008; Graue et al., 2007; Hurley and Deal, 2006; Shandera et al., 2010). More importantly, individuals with intellectual disability fail PVTs at a higher rate than those without (Dean et al., 2008; Salekin and Doane, 2009). In fact, Dean and colleagues (2008) found in their sample that all individuals with an IQ of less than 70 failed at least one PVT. Thus, cut-off scores for individuals with suspected intellectual disability may need to be adjusted due to a higher rate of false-positive results in this population. For example, lowering the TOMM Trial 2 and Retention Trial cut-off scores from 45 to 30 resulted in very low false-positive rates (0–4 percent) (Graue et al., 2007; Shandera et al., 2010).

Neurocognitive Impairments

There are individuals who apply for disability with primary allegations of cognitive dysfunction in one or more of the functional domains outlined above (e.g., “fuzzy” thinking, slowed thinking, poor memory, concentration difficulties). Standardized cognitive test results, as has been required for individuals claiming intellectual disability, are essential to the adjudication of such cases. These individuals may present with cognitive impairment due to a variety of reasons including, but not limited to, brain injury or disease (e.g., TBI or stroke) or neurodevelopmental disorders (e.g., learning disabilities, attention deficit hyperactivity disorder). Similarly, disability applicants may claim cognitive impairment secondary to a psychiatric disorder. For all of these claimants, documentation of impairment in functional cognitive domains with standardized cognitive tests is critically important. Within the process of collection of test result evidence of these impairments, inclusion of some documentation or assessment of effort is warranted and would help to validate the results of intellectual and adaptive functioning assessment.

Medical Impairments Without Biological Basis

Use of PVTs is generally recommended in evaluations of individuals with medically unexplained symptoms that include cognitive impairment (e.g., cognitive symptoms related to concentration, memory, or slowed thinking in patients with fibromyalgia or other medically unexplained pain syndromes) (Greiffenstein et al., 2013; Johnson-Greene et al., 2013). The rate of PVT failure is significant in these populations. For example, Johnson-Greene and colleagues (2013) reported a 37 percent failure rate in fibromyalgia patients, regardless of disability entitlement status. Greiffenstein and colleagues (2013) reported a 74 percent failure rate in disability-seeking patients with Complex Regional Pain Syndrome Type I. Sensitivity of PVTs may vary in these populations; in one large (n = 326) study of disability claimants (mainly with musculoskeletal and other pain conditions), rates of performance below cut-off levels varied from 17 to 43 percent on three different PVTs (Gervais et al., 2004), underscoring the need for administration of multiple PVTs during the assessment session.


The results of standardized cognitive tests that are appropriately administered, interpreted, and validated can provide objective evidence to help identify and document the presence and severity of medically determinable mental impairments at Step 2 of SSA's disability determination process. In addition, such tests can provide objective evidence to help identify and assess the severity of work-related cognitive functional impairment relevant to disability evaluations at the listing level (Step 3) and to mental residual functional capacity (Steps 4 and 5).Therefore, standardized cognitive test results are essential to the determination of all cases in which an applicant's allegation of cognitive impairment is not accompanied by objective medical evidence.

The results of cognitive tests are affected by the effort put forth by the test-taker. If an individual has not given his or her best effort in taking the test, the results will not provide an accurate picture of the person's neuropsychological or cognitive functioning. Performance validity indicators, which include PVTs, analysis of internal data consistency, and other corroborative evidence, help the evaluator to interpret the validity of an individual's neuropsychological or cognitive test results. For this reason, it is important to include an assessment of performance validity at the time cognitive testing is administered. It also is important that validity be assessed throughout the cognitive evaluation.

PVTs provide information about the validity of cognitive test results when administered as part of the test or test battery and are an important addition to the medical evidence of record for specific groups of applicants. It is important that PVTs only be administered in the context of a larger test battery and only be used to interpret information from that battery. Evidence of invalid performance based on PVT results pertains only to the cognitive test results obtained and does not provide information about whether or not the individual is, in fact, disabled. A lack of validity on PVTs alone is insufficient grounds for denying a disability claim.


  • AACN (American Academy of Clinical Neuropsychology). AACN practice guidelines for neuropsychological assessment and consultation. Clinical Neuropsychology. 2007;21(2):209–231. [PubMed: 17455014]
  • Allen LM III, Conder RL, Green P, Cox DR. CARB ‘97: Manual for the computerized assessment of response bias. Durham, NC: Cognisyst; 1997.
  • American Psychiatric Association. The diagnostic and statistical manual of mental disorders: DSM-5. Washington, DC: American Psychiatric Association; 2013.
  • APA (American Psychological Association). Guidelines and principles for accreditation of programs in professional psychology: Quick reference guide to doctoral programs. 2015. [January 20, 2015]. http://www​​/accreditation/about/policies/doctoral​.aspx .
  • Barrash J, Stillman A, Anderson SW, Uc Y, Dawson JD, Rizzo M. Predicition of driving ability with neuropsychological tests: Demographic adjustments diminish accuracy. Journal of the International Neuropsychological Society. 2010;16(04):679–686. [PMC free article: PMC3152745] [PubMed: 20441682]
  • Benedict RH. Brief visuospatial memory test—revised: Professional manual. Lutz, FL: Psychological Assessment Resources; 1997.
  • Benedict RH, Schretlen D, Groninger L, Brandt J. Hopkins Verbal Learning Test–Revised: Normative data and analysis of inter-form and test-retest reliability. The Clinical Neuropsychologist. 1998;12(1):43–55.
  • Benton AL, de Hamsher KS, Varney NR, Spreen O. Contributions to neuropsychological assessment: A clinical manual. New York: Oxford University Press; 1983.
  • Benton L, de Hamsher K, Sivan A. Multilingual Aphasia Examination. 1994a. Controlled oral word association test; p. 3.
  • Benton AL, de Hamsher KS, Varney NR, Spreen O. Contributions to neuropsychological assessment: A clinical manual—second edition. New York: Oxford University Press; 1994b.
  • Berthelson L, Mulchan SS, Odland AP, Miller LJ, Mittenberg W. False positive diagnosis of malingering due to the use of multiple effort tests. Brain Injury. 2013;27(7-8):909–916. [PubMed: 23782260]
  • Bianchini KJ, Mathias CW, Greve KW. Symptom validity testing: A critical review. The Clinical Neuropsychologist. 2001;15(1):19–45. [PubMed: 11778576]
  • Bigler ED. Symptom validity testing, effort, and neuropsychological assessment. Journal of the International Neuropsychological Society. 2012;18(4):632–642. [PubMed: 23057080]
  • Bigler ED. Limitations with symptom validity, performance validity, and effort tests; Presentation to IOM Committee on Psychological Testing, Including Validity Testing, for Social Security Administration; June 25, 2014; Washington, DC. 2014.
  • Bigler ED. Use of symptom validity tests and performance validity tests in disability determinations; Paper commissioned by the IOM Committee on Psychological Testing, Including Validity Testing, for Social Security Administration Disability Determinations; 2015. [April 9, 2015]. http://www​ .
  • Bilder RM, Sugar CA, Hellemann GS. Cumulative false positive rates given multiple performance validity tests: Commentary on Davis and Millis (2014) and Larrabee (2014). The Clinical Neuropsychologist. 2014;28(8):1212–1223. [PMC free article: PMC4331348] [PubMed: 25490983]
  • Binder LM. Portland Digit Recognition Test manual—second edition. Portland, OR: Private Publication; 1993.
  • Binder LM, Willis SC. Assessment of motivation after financially compensable minor head trauma. Psychological Assessment. 1991;3(2):175–181.
  • Binder LM, Villanueva MR, Howieson D, Moore RT. The Rey AVLT recognition memory task measures motivational impairment after mild head trauma. Archives of Clinical Neuropsychology. 1993;8:137–147. [PubMed: 14589671]
  • Binder LM, Iverson GL, Brooks BL. To err is human: “Abnormal” neuropsychological scores and variability are commin in healthy adults. Archives of Clinical Neuropsychology. 2009;24:31–46. [PubMed: 19395355]
  • Boone KB. Assessment of feigned cognitive impairment: A neuropsychological perspective. New York: Guilford Press; 2007.
  • Boone KB. The need for continuous and comprehensive sampling of effort/response bias during neuropsychological examinations. The Clinical Neuropsychologist. 2009;23(4):729–741. [PubMed: 18949583]
  • Boone KB. Selection and use of multiple performance validity tests (PVTs); Presentation to IOM Committee on Psychological Testing, Including Validity Testing, for Social Security Administration; June 25, 2014; Washington, DC. 2014.
  • Boone KB, Lu P. Non-forced-choice effort measures. In: Larrabee GJ, editor. Assessment of malingered neurocognitive deficits. New York: Oxford University Press; 2007. pp. 27–43.
  • Boone KB, Lu P, Back C, King C, Lee A, Philpott L, Shamieh E, Warner-Chacon K. Sensitivity and specificity of the Rey Dot Counting Test in patients with suspect effort and various clinical samples. Archives of Clinical Neuropsychology. 2002a;17(7):625–642. [PubMed: 14591847]
  • Boone KB, Lu PH, Herzberg D. The B Test manual. Los Angeles: Western Psychological Services; 2002b.
  • Boone KB, Lu P, Wen J. Comparison of various RAVLT scores in the detection of non-credible memory performance. Archives of Clinical Neuropsychology. 2005;20:301–319. [PubMed: 15797167]
  • Brandt J, Benedict RH. Hopkins Verbal Learning Test, Revised: Professional manual. Lutz, FL: Psychological Assessment Resources; 2001.
  • Brandt J, van Gorp W. American Academy of Clinical Neuropsychology policy on the use of non-doctoral-level personnel in conducting clinical neuropsychological evaluations. The Clinical Neuropsychologist. 1999;13(4):385.
  • Busch RM, Chelune GJ, Suchy Y. Using norms in neuropsychological assessment of the elderly. In: Attix DK, Welsh-Bohmer KA, editors. Geriatric neuropsychology: Assessment and intervention. New York: Guilford Press; 2006.
  • Bush SS, Ruff RM, Tröster AI, Barth JT, Koffler SP, Pliskin NH, Reynolds CR, Silver CH. Symptom validity assessment: Practice issues and medical necessity. NAN policy & planning committee; Archives of Clinical Neuropsychology. 2005;20(4):419–426. [PubMed: 15896556]
  • Carone DA. Children with moderate/severe brain damage/dysfunction outperform adults with mild-to-no brain damage on the Medical Symptom Validity Test. Brain Injury. 2008;22(12):960–971. [PubMed: 19005888]
  • Carrow-Woolfolk E. CASL: Comprehensive Assessment of Spoken Language. Circle Pines, MN: American Guidance Services; 1999.
  • Chafetz MD. Malingering on the Social Security disability consultative exam: Predictors and base rates. The Clinical Neuropsychologist. 2008;22(3):529–546. [PubMed: 17853151]
  • Chafetz MD. The psychological consultative examination for Social Security disability. Psychological Injury and Law. 2011;4(3-4):235–244.
  • Chafetz MD, Underhill J. Estimated costs of malingered disability. Archives of Clinical Neuropsychology. 2013;28(7):633–639. [PubMed: 23800432]
  • Chafetz MD, Abrahams JP, Kohlmaier J. Malingering on the Social Security disability consultative exam: A new rating scale. Archives of Clinical Neuropsychology. 2007;22(1):1–14. [PubMed: 17097263]
  • Conder R, Allen L, Cox D. Computerized Assessment of Response Bias test manual. Durham, NC: Cognisyst; 1992.
  • Davis JJ, Millis SR. Examination of performance validity test failure in relation to number of tests administered. The Clinical Neuropsychologist. 2014;28(2):199–214. [PubMed: 24528190]
  • Dean AC, Victor TL, Boone KB, Arnold G. The relationship of IQ to effort test performance. The Clinical Neuropsychologist. 2008;22(4):705–722. [PubMed: 17853124]
  • Delis DC. CVLT-C, California Verbal Learning Test: Children's version: Manual. San Antonio, TX: The Psychological Corporation; 1994.
  • Delis DC, Kramer JH, Kaplan E. California Verbal Learning Test: CVLT-II; adult version; manual. San Antonio, TX: The Psychological Corporation; 2000.
  • Delis D, Kaplan E, Kramer J. Delis-Kaplan executive function system. San Antonio, TX: The Psychological Corporation; 2001.
  • DeRight J, Carone DA. Assessment of effort in children: A systematic review. Child Neuropsychology. 2015;21(1):1–24. [PubMed: 24344790]
  • Edmonds EC, Delano-Wood L, Galasko DR, Salmon DP, Bondi MW. Subjective cognitive complaints contribute to misdiagnosis of mild cognitive impairment. Journal of the International Neuropsychological Society. 2014;20(8):836–847. [PMC free article: PMC4172502] [PubMed: 25156329]
  • Elliott R. Executive functions and their disorders. British Medical Bulletin. 2003;65:49–59. [PubMed: 12697616]
  • Etherton JL, Bianchini KJ, Ciota MA, Greve KW. Reliable Digit Span is unaffected by laboratory-induced pain: Implications for clinical use. Assessment. 2005a;12(1):101–106. [PubMed: 15695748]
  • Etherton JL, Bianchini KJ, Greve KW, Ciota MA. Test of Memory Malingering performance is unaffected by laboratory-induced pain: Implications for clinical use. Archives of Clinical Neuropsychology. 2005b;20(3):375–384. [PubMed: 15797173]
  • Etkin A, Gyurak A, O'Hara R. A neurobiological approach to the cognitive deficits of psychiatric disorders. Dialogues in Clinical Neuroscience. 2013;15(4):419. [PMC free article: PMC3898680] [PubMed: 24459409]
  • Farias ST, Mungas D, Jagust W. Degree of discrepancy between self and other reported everyday functioning by cognitive status: Dementia, mild cognitive impairment, and healthy elders. International Journal of Geriatric Psychiatry. 2005;20(9):827–834. [PMC free article: PMC2872134] [PubMed: 16116577]
  • Faust D, Hart K, Guilmette T, Arkes H. Neuropsychologists' capacity to detect adolescent malingerers. Professional Psychology: Research and Practice. 1988;19:508–515.
  • Frederick RI. Validity indicator profile manual. Minnetonka, MN: NCS Assessments; 1997.
  • Frederick RI, Foster HG. Multiple measures of malingering on a forced-choice test of cognitive ability. Psychological Assessment. 1991;3(4):596–602.
  • Freedman D, Manly J. Use of normative data and measures of performance validity and symptom validity in assessment of cognitive function; Paper commissioned by the IOM Committee on Psychological Testing, Including Validity Testing, for Social Security Administration Disability Determinations; 2015. [April 9, 2015]. http://www​ .
  • Funahashi S. Neuronal mechanisms of executive control by the prefrontal cortex. Neuroscience Research. 2001;39:147–165. [PubMed: 11223461]
  • Gast J, Hart KJ. The performance of juvenile offenders on the Test of Memory Malingering. Journal of Forensic Psychology Practice. 2010;10(1):53–68.
  • Gervais RO, Rohling ML, Green P, Ford W. A comparison of WMT, CARB, and TOMM failure rates in non-head injury disability claimants. Archives of Clinical Neuropsychology. 2004;19(4):475–487. [PubMed: 15163449]
  • Goodglass H, Kaplan E. Boston diagnostic aphasia examination. Philadelphia: Lea & Febiger; 1983.
  • Graue LO, Berry DT, Clark JA, Sollman MJ, Cardi M, Hopkins J, Werline D. Identification of feigned mental retardation using the new generation of malingering detection instruments: Preliminary findings. The Clinical Neuropsychologist. 2007;21(6):929–942. [PubMed: 17886151]
  • Green P. Green's Memory Complaints Inventory (MCI). Edmonton, Alberta, Canada: Green's; 2004.
  • Green P. Green's Word Memory Test for Window's: User's manual. Edmonton, Alberta, Canada: Green's; 2005.
  • Green P. Manual for Nonverbal Medical Symptom Validity Test. Edmonton, Alberta, Canada: Green's; 2008.
  • Green P, Flaro L. Word Memory Test performance in children. Child Neuropsychology. 2003;9(3):189–207. [PubMed: 13680409]
  • Green P, Allen L, Astner K. The Word Memory Test: A user's guide to the oral and computer-administered forms, U.S. version 1.1. Durham, NC: CogniSyst; 1996.
  • Greiffenstein MF, Baker WJ, Gola T. Validation of malingered amnesia measures with a large clinical sample. Psychological Assessment. 1994;6(3):218–224.
  • Greiffenstein M, Gervais R, Baker WJ, Artiola L, Smith H. Symptom validity testing in medically unexplained pain: A chronic regional pain syndrome type 1 case series. The Clinical Neuropsychologist. 2013;27(1):138–147. [PubMed: 23062188]
  • Greve KW, Bianchini KJ. Setting empirical cutoffs on psychometric indicators of negative response bias: A methodological commentary with recommendations. Archives of Clinical Neuropsychology. 2004;19(4):533–541. [PubMed: 15163454]
  • Griffin GA, Normington J, May R, Glassmire D. Assessing dissimulation among Social Security disability income claimants. Journal of Consulting Clinical Psychology. 1996;64(6):1425–1430. [PubMed: 8991329]
  • Gronwall D. Paced auditory serial-addition task: A measure of recovery from concussion. Perceptual and Motor Skills. 1977;44(2):367–373. [PubMed: 866038]
  • Grote LG, Hook JN. Forced-choice recognition tests of malingering. In: Larrabee GJ, editor. Assessment of malingered neurocognitive deficits. New York: Oxford University Press; 2007. pp. 27–43.
  • Groth-Marnat G. Handbook of psychological assessment. Hoboken, NJ: John Wiley & Sons; 2009.
  • Hammill DD, Larsen SC. Test of written language: Examiner's manual. 4th. Austin, TX: Pro-Ed.; 2009.
  • Hampson NE, Kemp S, Coughlan AK, Moulin CJ, Bhakta BB. Applied Neuropsychology: Adult. ahead-of-print. 2013. Effort test performance in clinical acute brain injury, community brain injury, and epilepsy populations; pp. 1–12. [PubMed: 25084843]
  • Heaton RK. Wisconsin Card Sorting Test: Computer version 2. Odessa, FL: Psychological Assessment Resources; 1993.
  • Heaton RK, Smith HH, Lehman RA, Vogt AT. Prospects for faking believable deficits on neuropsychological testing. Journal of Consulting and Clinical Psychology. 1978;46(5):892. [PubMed: 701568]
  • Heaton RK, Grant I, Matthews CG. Comprehensive norms for an expanded Halstead-Reitan Battery: Demographic corrections, research findings, and clinical applications. Odessa, FL: Psychological Assessment Resources; 1991.
  • Heaton RK, Taylor M, Manly J. Demographic effects and demographically corrected norms with the WAIS-III and WMS-III. In: Tulsky D, Heaton RK, Chelune GJ, Ivnik I, Bornstein RA, Prifitera A, Ledbetter M, editors. Clinical interpretations of the WAIS-II and WMS-III. San Diego, CA: Academic Press; 2001. pp. 181–210.
  • Heilbronner RL, Sweet JJ, Morgan JE, Larrabee GJ, Millis SR. Conference Participants. American Academy of Clinical Neuropsychology consensus conference statement on the neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist. 2009;23(7):1093–1129. [PubMed: 19735055]
  • Higginson CI, Lanni K, Sigvardt KA, Disbrow EA. The contribution of trail making to the prediction of performance-based instrumental activities of daily living in Parkinson's disease without dementia. Journal of Clinical and Experimental Neuropsychology. 2013;35(5):530–539. [PMC free article: PMC3674142] [PubMed: 23663116]
  • Hiscock M, Hiscock CK. Refining the forced-choice method for the detection of malingering. Journal of Clinical and Experimental Neuropsychology. 1989;11(6):967–974. [PubMed: 2531755]
  • HNS (Houston Neuropsychological Society). The Houston Conference on Specialty Education and Training in Clinical Neuropsychology policy statement. 2003. [November 25, 2014]. http://www​ .
  • Holdnack JA, Drozdick LW. Advanced clinical solutions for WAIS-IV and WMS-IV: Clinical and interpretive manual. San Antonio, TX: Pearson; 2009.
  • Hurley KE, Deal WP. Assessment instruments measuring malingering used with individuals who have mental retardation: Potential problems and issues. Mental Retardation. 2006;44(2):112–119. [PubMed: 16689611]
  • Iverson GL, Franzen MD. Using multiple objective memory procedures to detect simulated malingering. Journal of Clinical and Experimental Neuropsychology. 1996;18(1):38–51. [PubMed: 8926295]
  • Jelicic M, Merckelbach H, Candel I, Geraets E. Detection of feigned cognitive dysfunction using special malinger tests: A simulation study in naïve and coached malingerers. The International Journal of Neuroscience. 2007;117(8):1185–1192. [PubMed: 17613120]
  • Johnson-Greene D, Brooks L, Ference T. Relationship between performance validity testing, disability status, and somatic complaints in patients with fibromyalgia. The Clinical Neuropsychologist. 2013;27(1):148–158. [PubMed: 23121595]
  • Kaplan E, Goodglass H, Weintraub S. Boston Naming Test. Austin, TX: Pro-Ed.; 2001.
  • Killgore WD, DellaPietra L. Using the WMS-III to detect malingering: Empirical validation of the rarely missed index (RMI). Journal of Clinical and Experimental Neuropsychology. 2000;22:761–771. [PubMed: 11320434]
  • Kirkwood M. Validity testing in pediatric populations; Presentation to IOM Committee on Psychological Testing, Including Validity Testing, for Social Security Administration; June 25, 2014; Washington, DC. 2014.
  • Kirkwood MW, Yeates KO, Randolph C, Kirk JW. The implications of symptom validity test failure for ability-based test performance in a pediatric sample. Psychological Assessment. 2012;24(1):36–45. [PubMed: 21767023]
  • Larrabee GJ. Detection of malingering using atypical performance patterns on standard neuropsychological tests. The Clinical Neuropsychologist. 2003;17(3):410–425. [PubMed: 14704892]
  • Larrabee GJ. Introduction: Malingering, research designs, and base rates. In: Larrabee GJ, editor. Assessment of malingered neuropsychological deficits. New York: Oxford University Press; 2007.
  • Larrabee GJ. Assessment of malingering. In: Larrabee GJ, editor. Forensic neuropsychology: A scientific approach. New York: Oxford University Press; 2012a.
  • Larrabee GJ. Performance validity and symptom validity in neuropsychological assessment. Journal of the International Neuropsychological Society. 2012b;18(4):625–630. [PubMed: 23057079]
  • Larrabee GJ. False-positive rates associated with the use of multiple performance and symptom validity tests. Archives of Clinical Neuropsychology. 2014a;29(4):364–373. [PubMed: 24769887]
  • Larrabee GJ. Performance and Symptom Validity; Presentation to IOM Committee on Psychological Testing, Including Validity Testing, for Social Security Administration; June 25, 2014; Washington, DC. 2014b.
  • Lewis RF. Digit Vigilance Test. Lutz, FL: Psychological Assessment Resources; 1990.
  • Lezak M, Howieson D, Bigler E, Tranel D. Neuropsychological assessment. 5th. New York: Oxford University Press; 2012.
  • Lu PH, Boone KB, Cozolino L, Mitchell C. Effectiveness of the Rey-Osterrieth Complex Figure Test and the Meyers and Meyers Recognition Trial in the detection of suspect effort. The Clinical Neuropsychologist. 2003;17:426–440. [PubMed: 14704893]
  • MacAllister WS, Nakhutina L, Bender HA, Karantzoulis S, Carlson C. Assessing effort during neuropsychological evaluation with the TOMM in children and adolescents with epilepsy. Child Neuropsychology. 2009;15(6):521–531. [PubMed: 19424879]
  • Manly J, Echemendia R. Race-specific norms: Using the model of hypertension to understand issues of race, culture, and education in neuropsychology. Archives of Clinical Neuropsychology. 2007;22(3):319–325. [PubMed: 17350797]
  • McCrea M, Kelly JP, Randolph C, Cisler R, Berger L. Immediate neurocognitive effects of concussion. Neurosurgery. 2002;50(5):1032–1042. [PubMed: 11950406]
  • McCrea M, Guskiewicz KM, Marshall SW, Barr W, Randolph C, Cantu RC, Onate JA, Yang J, Kelly JP. Acute effects and recovery time following concussion in collegiate football players: The NCAA concussion study. JAMA. 2003;290(19):2556–2563. [PubMed: 14625332]
  • Meyers JE, Volbrecht M. Detection of malingers using the Rey Complex Figure and Recognition Trial. Applied Neuropsychology. 1999;6:201–207. [PubMed: 10635434]
  • Mittenberg W, Patton C, Canyock EM, Condit DC. Base rates of malingering and symptom exaggeration. Journal of Clinical and Experimental Neuropsychology. 2002;24(8):1094–1102. [PubMed: 12650234]
  • Mittenberg W, Patton C, Legler W. Identification of malingered head injury on the Wechsler Memory Scale—Third Edition; Paper presented at the annual conference of the National Academy of Neuropsychology; Dallas, TX. 2003.
  • Moritz S, Ferahli S, Naber D. Memory and attention performance in psychiatric patients: Lack of correspondence between clinician-rated and patient-rated functioning with neuropsychological test results. Journal of the International Neuropsychological Society. 2004;10(4):623–633. [PubMed: 15327740]
  • NAN (National Academy of Neuropsychology). NAN definition of a clinical neuropsychologist: Official position of the National Academy of Neuropsychology. 2001. [November 25, 2014]. https://www​.nanonline​.org/docs/PAIC/PDFs​/NANPositionDefNeuro.pdf .
  • Niccolls R, Bolter JF. Multi-Digit Memory Test. San Luis Obispo, CA: Wang Neuropsychological Laboratories; 1991.
  • NIH (National Institutes of Health). NIH toolbox: Processing speed. n.d. [October 15, 2014]. http://www​.nihtoolbox​.org/WhatAndWhy/Cognition​/ProcessingSpeed/Pages/default​.aspx .
  • OIDAP (Occupational Information Development Advisory Panel). Mental cognitive subcommittee: Content model and classification recommendations. 2009. [October 6, 2014]. http://www​​/Documents/AppendixC.pdf .
  • Paulhus DL. Paulhus Deception Scales (PDS). Toronto: Multi-Health Systems; 1998.
  • Randolph C. Repeatable Battery for the Assessment of Neuropsychological Status (RBANS). San Antonio, TX: Psychological Corporation; 1998. [PubMed: 9845158]
  • Rao SM. Neuropsychology of multiple sclerosis: A critical review. Journal of Clinical and Experimental Neuropsychology. 1986;8(5):503–542. [PubMed: 3805250]
  • Reitan RM. Trail Making Test: Manual for administration and scoring. Mesa, AZ: Reitan Neuropsychology Laboratory; 1992.
  • Reitan RM, Wolfson D. The Halstead-Reitan neuropsychological test battery: Theory and clinical interpretation—second edition. Tucson: Neuropsychology Press; 1993.
  • Rey A. L'examen psychologique dans les cas d'encéphalopathie traumatique (les problems). Archives de Psychologie. 1941;28:286–340.
  • Rey A. The clinical examination in psychology. Paris, France: Presses Universitaires de France; 1964.
  • Roberson CJ, Boone KB, Goldberg H, Miora D, Cottingham M, Victor T, Ziegler E, Zeller M, Wright M. Cross validation of the B test in a large known groups sample. The Clinical Neuropsychologist. 2013;27(3):495–508. [PubMed: 23157695]
  • Ruben RJ. Redefining the survival of the fittest: Communication disorders in the 21st century. International Journal of Pediatric Otorhinolaryngology. 1999;49:S37–S38. [PubMed: 10577772]
  • Salazar XF, Lu PH, Wen J, Boone KB. The use of effort tests in ethnic minorities and in non-English-speaking and English as a second language populations. In: Boone KB, editor. Assessment of feigned cognitive impairment: A neuropsychological perspective. New York: Guilford Press; 2007. pp. 405–427.
  • Salekin KL, Doane BM. Malingering intellectual disability: The value of available measures and methods. Applied Neuropsychology. 2009;16(2):105–113. [PubMed: 19430992]
  • Schacter DL. Toward a cognitive neuropsychology of awareness: Implicit knowledge and anosognosia. Journal of Clinical and Experimental Neuropsychology. 1990;12(1):155–178. [PubMed: 2406281]
  • Schmidt M. Rey Auditory Verbal Learning Test: RAVLT: A handbook. Los Angeles: Western Psychological Services; 1996.
  • Schretlen DJ, Testa S, Winicki JM, Pearlson GD, Gordon B. Frequency and bases of abnormal performance by healthy adults on neuropsychological testing. Journal of the International Neuropsychological Society. 2008;14(3):436–445. [PubMed: 18419842]
  • Semel E, Wiig E, Secord W. Clinical evaluation of language fundamentals: Examiners manual. 4th. San Antonio, TX: The Psychological Corporation; 2003.
  • Shandera AL, Berry DT, Clark JA, Schipper LJ, Graue LO, Harp JP. Detection of malingered mental retardation. Psychological Assessment. 2010;22(1):50–56. [PubMed: 20230151]
  • Sheslow D, Adams W. Wide range assessment of memory and learning second edition administration and technical manual. Lutz, FL: Psychological Assessment Resources; 2003.
  • Silverberg ND, Millis SR. Impairment versus deficiency in neuropsychological assessment: Implications for ecological validity. Journal of the International Neuropsychological Society. 2009;15(1):94–102. [PubMed: 19128532]
  • Silverton L. Malingering Probability Scale (MPS) manual. Los Angeles, CA: Western Psychological Services; 1999.
  • Slick DJ, Hopp G, Strauss E, Thompson GB. Victoria Symptom Validity Test: Professional manual. Odessa, FL: Psychological Assessment Resources; 1997.
  • Slick DJ, Sherman EMS, Iverson GL. Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research. The Clinical Neuropsychologist (Neuropsychology, Development and Cognition: Section D). 1999;13(4):545–561. [PubMed: 10806468]
  • Sollman MJ, Berry DT. Detection of inadequate effort on neuropsychological testing: A meta-analytic update and extension. Archives of Clinical Neuropsychology. 2011;26(8):774–789. [PubMed: 22100969]
  • Solomon RE, Boone KB, Miora D, Skidmore S, Cottingham M, Victor T, Ziegler E, Zeller M. Use of the WAIS-III picture completion subtest as an embedded measure of response bias. The Clinical Neuropsychologist. 2010;24(7):1243–1256. [PubMed: 20924983]
  • Spreen O, Strauss E. Controlled oral word association (word fluency). In: Spreen O, Strauss E, editors. A compendium of neuropsychological tests. Oxford, UK: Oxford University Press; 1991. pp. 219–227.
  • SSA (Social Security Administration). Disability evaluation under social security—Part III: Listing of impairments—Adult listings (Part A)—section 12.00 mental disorders. n.d.-a. [November 14, 2014]. http://www​​/professionals/bluebook/12​.00-MentalDisorders-Adult.htm .
  • SSA. Disability evaluation under Social Security: Part I—general information. n.d.-b. [November 14, 2014]. http://www​​/professionals​/bluebook/general-info.htm .
  • Stevens A, Schneider K, Liske B, Hermle L, Huber H, Hetzel G. Is subnormal cognitive performance in schizophrenia due to lack of effort or to cognitive impairment? German Journal of Psychiatry. 2014;17(1):9.
  • Strauss E, Sherman EM, Spreen O. A compendium of neuropsychological tests: Administration, norms, and commentary. Oxford, UK: Oxford University Press; 2006.
  • Suchy Y, Chelune G, Franchow EI, Thorgusen SR. Confronting patients about insufficient effort: The impact on subsequent symptom validity and memory performance. The Clinical Neuropsychologist. 2012;26(8):1296–1311. [PubMed: 23061472]
  • Suhr JA, Boyer D. Use of the Wisconsin Card Sorting Test in the detection of malingering in student simulator and patient samples. Journal of Clinical and Experimental Neuropsychology. 1999;21:701–708. [PubMed: 10572288]
  • Sweet JJ, Meyer DG, Nelson NW, Moberg PJ. The TCN/AACN 2010 “salary survey”: Professional practices, beliefs, and incomes of U.S. neuropsychologists. The Clinical Neuropsychologist. 2011;25(1):12–61. [PubMed: 21253962]
  • Tombaugh TN, Tombaugh PW. Test of Memory Malingering: TOMM. North Tonawanda, NY: Multi-Health Systems; 1996.
  • Trahan DE, Larrabee GJ. Continuous Visual Memory Test. Odessa, FL: Psychological Assessment Resources; 1988.
  • van Gorp WG, Humphrey LA, Kalechstein A, Brumm VL, McMullen WJ, Stoddard M, Pachana NA. How well do standard clinical neuropsychological tests identify malingering?: A preliminary analysis. Journal of Clinical and Experimental Neuropsychology. 1999;21(2):245–250. [PubMed: 10425521]
  • Victor TL, Boone KB. Identification of feigned mental retardation. In: Boone K, editor. Assessment of feigned cognitive impairment. New York: Guilford Press; 2007. pp. 310–345.
  • Victor TL, Boone K, Serpa JG, Buehler J, Ziegler E. Interpreting the meaning of multiple symptom validity test failure. The Clinical Neuropsychologist. 2009;23(2):297–313. [PubMed: 18821138]
  • Warrington E. Recognition Memory Test manual. Windsor: Nfer-Nelson; 1984.
  • Wechsler D. Wechsler Adult Intelligence Scale (WAIS-III): Administration and scoring manual—3rd edition. San Antonio, TX: The Psychological Corporation; 1997a.
  • Wechsler D. WMS-III: Wechsler Memory Scale administration and scoring manual. San Antonio, TX: The Psychological Corporation; 1997b.
  • Wechsler D. Wechsler Intelligence Scale for Children—fourth edition (WISC-IV). San Antonio, TX: The Psychological Corporation; 2003.
  • Wechsler D. Wechsler Adult Intelligence Scale—fourth edition (WAIS-IV). San Antonio, TX: NCS Pearson; 2008.
  • Wechsler D. WMS-IV: Wechsler Memory Scale—Administration and scoring manual. San Antonio, TX: The Psychological Corporation; 2009.
  • WHO (World Health Organization). International classification of functioning, disability, and health (ICF). Geneva, Switzerland: WHO; 2001.
  • Young G. Resource material for ethical psychological assessment of symptom and performance validity, including malingering. Psychological Injury and Law. 2014;7(3):206–235.



As documented in Chapters 1 and 2, 57 percent of claims fall under mental disorders other than intellectual disability and/or connective tissue disorders.


Public comments are currently under review and a final rule has yet to be published as of the publication of this report.


At the committee's second meeting, Drs. Bianchini, Boone, and Larrabee all expressed great concern about the susceptibility of PVTs to coaching and stressed the importance of ensuring test security, as disclosure of test materials adversely affects the reliability and validity of psychological test results.


The exception being a single below-chance failure on a forced-choice PVT is sufficient to render scores invalid.

Copyright 2015 by the National Academy of Sciences. All rights reserved.
Bookshelf ID: NBK305230


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (2.1M)

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...