Send to

Choose Destination
Acad Psychiatry. 2018 Aug 8. doi: 10.1007/s40596-018-0957-8. [Epub ahead of print]

Inflated Clinical Evaluations: a Comparison of Faculty-Selected and Mathematically Calculated Overall Evaluations Based on Behaviorally Anchored Assessment Data.

Author information

Uniformed Services University of the Health Sciences, Bethesda, MD, USA.
Uniformed Services University of the Health Sciences, Bethesda, MD, USA.
Reynolds Army Community Hospital, Fort Sill, OK, USA.



This retrospective study compared faculty-selected evaluation scores with those mathematically calculated from behaviorally anchored assessments.


Data from 1036 psychiatry clerkship clinical evaluations (2012-2015) was reviewed. These clinical evaluations required faculty to assess clinical performance using 14 behaviorally anchored questions followed by a faculty-selected overall evaluation. An explicit rubric was included in the overall evaluation to assist the faculty in interpreting their 14 assessment responses. Using the same rubric, mathematically calculated evaluations of the same assessment responses were generated and compared to the faculty-selected evaluations.


Comparison of faculty-selected to mathematically calculated evaluations revealed that while the two methods were reliably correlated (Cohen's kappa = 0.314, Pearson's coefficient = 0.658, p < 0.001), there was a notable difference in the results (t = 24.5, p < 0.0001). The average faculty-selected evaluation was 1.58 (SD = 0.61) with a mode of "1" or "outstanding," while the mathematically calculated evaluation had an average of 2.10 (SD = 0.90) with a mode of "3" or "satisfactory." 51.0% of the faculty-selected evaluations matched the mathematically calculated results: 46.1% were higher and 2.9% were lower.


Clerkship clinical evaluation forms that require faculty to make an overall evaluation generate results that are significantly higher than what would have been assigned solely using behavioral anchored assessment questions. Focusing faculty attention on assessing specific behaviors rather than overall evaluations may reduce this inflation and improve validity. Clerkships may want to consider removing overall evaluation questions from their clinical evaluation tools.


Assessment; Competency; Evaluation; Medical student; UME


Supplemental Content

Full text links

Icon for Springer
Loading ...
Support Center