Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing

Chun Wang; David J Weiss; Zhuoran Shang

doi:10.1007/s11336-018-9644-7

Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing

Psychometrika. 2019 Sep;84(3):749-771. doi: 10.1007/s11336-018-9644-7. Epub 2018 Dec 3.

Authors

Chun Wang¹, David J Weiss², Zhuoran Shang²

Affiliations

¹ Measurement and Statistics, College of Education, University of Washington, 312E Miller Hall, Box 353600, Seattle, WA , 98195-3600, USA. wang4066@uw.edu.
² University of Minnesota, Minneapolis, USA.

PMID: 30511327
DOI: 10.1007/s11336-018-9644-7

Abstract

In computerized adaptive testing (CAT), a variable-length stopping rule refers to ending item administration after a pre-specified measurement precision standard has been satisfied. The goal is to provide equal measurement precision for all examinees regardless of their true latent trait level. Several stopping rules have been proposed in unidimensional CAT, such as the minimum information rule or the maximum standard error rule. These rules have also been extended to multidimensional CAT and cognitive diagnostic CAT, and they all share the same idea of monitoring measurement error. Recently, Babcock and Weiss (J Comput Adapt Test 2012. https://doi.org/10.7333/1212-0101001) proposed an "absolute change in theta" (CT) rule, which is useful when an item bank is exhaustive of good items for one or more ranges of the trait continuum. Choi, Grady and Dodd (Educ Psychol Meas 70:1-17, 2010) also argued that a CAT should stop when the standard error does not change, implying that the item bank is likely exhausted. Although these stopping rules have been evaluated and compared in different simulation studies, the relationships among the various rules remain unclear, and therefore there lacks a clear guideline regarding when to use which rule. This paper presents analytic results to show the connections among various stopping rules within both unidimensional and multidimensional CAT. In particular, it is argued that the CT-rule alone can be unstable and it can end the test prematurely. However, the CT-rule can be a useful secondary rule to monitor the point of diminished returns. To further provide empirical evidence, three simulation studies are reported using both the 2PL model and the multidimensional graded response model.

Keywords: computerized adaptive testing; information; multidimensional models; standard error; stopping rules; variable-length adaptive testing.

Publication types

Comparative Study
Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Bias
Cognition / physiology*
Computer Simulation / statistics & numerical data*
Dimensional Measurement Accuracy
Humans
Psychometrics / methods*

Grants and funding

R01HD079439/HD/NICHD NIH HHS/United States