Format

Send to

Choose Destination
See comment in PubMed Commons below
Adv Health Sci Educ Theory Pract. 2016 Oct;21(4):761-73. doi: 10.1007/s10459-015-9664-3. Epub 2016 Jan 12.

Inter-rater reliability and generalizability of patient note scores using a scoring rubric based on the USMLE Step-2 CS format.

Author information

1
Department of Medical Education (MC 591), College of Medicine, University of Illinois at Chicago, 808 South Wood Street, 963 CMET, Chicago, IL, 60612-7309, USA. yspark2@uic.edu.
2
Department of Family Medicine (MC 785), College of Medicine, University of Illinois at Chicago, 1819 West Polk Street, 150 CMW, Chicago, IL, 60612-7309, USA.
3
Department of Medical Education (MC 591), College of Medicine, University of Illinois at Chicago, 808 South Wood Street, 963 CMET, Chicago, IL, 60612-7309, USA.

Abstract

Recent changes to the patient note (PN) format of the United States Medical Licensing Examination have challenged medical schools to improve the instruction and assessment of students taking the Step-2 clinical skills examination. The purpose of this study was to gather validity evidence regarding response process and internal structure, focusing on inter-rater reliability and generalizability, to determine whether a locally-developed PN scoring rubric and scoring guidelines could yield reproducible PN scores. A randomly selected subsample of historical data (post-encounter PN from 55 of 177 medical students) was rescored by six trained faculty raters in November-December 2014. Inter-rater reliability (% exact agreement and kappa) was calculated for five standardized patient cases administered in a local graduation competency examination. Generalizability studies were conducted to examine the overall reliability. Qualitative data were collected through surveys and a rater-debriefing meeting. The overall inter-rater reliability (weighted kappa) was .79 (Documentation = .63, Differential Diagnosis = .90, Justification = .48, and Workup = .54). The majority of score variance was due to case specificity (13 %) and case-task specificity (31 %), indicating differences in student performance by case and by case-task interactions. Variance associated with raters and its interactions were modest (<5 %). Raters felt that justification was the most difficult task to score and that having case and level-specific scoring guidelines during training was most helpful for calibration. The overall inter-rater reliability indicates high level of confidence in the consistency of note scores. Designs for scoring notes may optimize reliability by balancing the number of raters and cases.

KEYWORDS:

Patient note; Rater effects; USMLE Step-2 CS; Validity

PMID:
26757931
DOI:
10.1007/s10459-015-9664-3
[Indexed for MEDLINE]
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Springer
    Loading ...
    Support Center