Format

Send to

Choose Destination
Teach Learn Med. 2015;27(4):404-9. doi: 10.1080/10401334.2015.1077131.

The Impact of Repeated Exposure to Items.

Author information

1
a Psychometric Services, American Board of Family Medicine , Lexington , Kentucky , USA.
2
b Education Reform and Research, University of Kentucky , Lexington , Kentucky , USA.
3
c Department of Clinical Medicine , North Carolina State University , Raleigh , North Carolina , USA.

Abstract

THEORY:

When test developers have a limited number of test questions available or when the equating design requires some item overlap across forms, psychometricians worry that examinees who encounter previously seen questions on subsequent test forms may be able to inflate their test score due to their familiarity with the repeated test questions.

HYPOTHESES:

Prior exposure to test questions may lead to contamination and inflated scores. This research seeks to detect if examinees' scores were inflated due to prior exposure to test questions and, if so, whether those increases were significant.

METHOD:

The sample for this study consisted of candidates who took the American Board of Family Medicine's certification examination twice in a single year (n = 988). Examinees were randomly assigned one of two forms for their first attempt and received the other form for their repeat test. There were 99 questions in common across both forms. The Rasch model was used to estimate examinee ability. Performance changes on the common questions and unique questions were compared and repeated measures t tests were performed to establish whether score changes were likely to have occurred by chance.

RESULTS:

On average, the examinees increased their overall ability estimate by .187 logits on the repeat attempt. The repeated measures t tests indicate this difference was statistically significant, t(987) = -25.298, p < .001, α = .05. The mean difference between the examinees' ability estimate on common and unique items for their first attempt was not statistically significant, t(987) = .264, p = .792, α = .05; however, the mean difference between common and unique items on the second attempt (0.029 logits) was statistically significant, t(987) = 3.28, p = .001, α = .05.

CONCLUSIONS:

Some of the increase in the examinees' overall ability estimate may attributed to a general increase in the latent trait; however, there was a small but detectable increase that could be attributed to prior exposure to the questions. On average, about 15% of the repeated questions were changed from wrong to right, but about 11% of questions were changed from right to wrong, suggesting that examinees may occasionally be using prior exposure to their benefit but general guessing accounts for more of the changes. The impact of the mean difference between the common and unique item scores (0.029 logits) is trivial at the individual level; however, such a bias among the population of repeat testers could be problematic if a small subset of examinees were using a "remember-research-retest" strategy to obtain nontrivial score increases.

KEYWORDS:

Rasch model; certification examination; item exposure; repeat items; score inflation

PMID:
26507998
DOI:
10.1080/10401334.2015.1077131
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Taylor & Francis
Loading ...
Support Center