Comparison of Bayesian Sample Size Criteria: ACC, ALC, and WOC

J Stat Plan Inference. 2009 Dec 1;139(12):4111-4122. doi: 10.1016/j.jspi.2009.05.041.

Abstract

A challenge for implementing performance based Bayesian sample size determination is selecting which of several methods to use. We compare three Bayesian sample size criteria: the average coverage criterion (ACC) which controls the coverage rate of fixed length credible intervals over the predictive distribution of the data, the average length criterion (ALC) which controls the length of credible intervals with a fixed coverage rate, and the worst outcome criterion (WOC) which ensures the desired coverage rate and interval length over all (or a subset of) possible datasets. For most models, the WOC produces the largest sample size among the three criteria, and sample sizes obtained by the ACC and the ALC are not the same. For Bayesian sample size determination for normal means and differences between normal means, we investigate, for the first time, the direction and magnitude of differences between the ACC and ALC sample sizes. For fixed hyperparameter values, we show that the difference of the ACC and ALC sample size depends on the nominal coverage, and not on the nominal interval length. There exists a threshold value of the nominal coverage level such that below the threshold the ALC sample size is larger than the ACC sample size, and above the threshold the ACC sample size is larger. Furthermore, the ACC sample size is more sensitive to changes in the nominal coverage. We also show that for fixed hyperparameter values, there exists an asymptotic constant ratio between the WOC sample size and the ALC (ACC) sample size. Simulation studies are conducted to show that similar relationships among the ACC, ALC, and WOC may hold for estimating binomial proportions. We provide a heuristic argument that the results can be generalized to a larger class of models.

Keywords: average coverage criterion; average length criterion; coverage rate; credible interval; interval length; worst outcome criterion.