Send to

Choose Destination
J Clin Epidemiol. 1999 Oct;52(10):935-42.

Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis.

Author information

Center for Clinical Decision Sciences, Department of Public Health, Erasmus University, Rotterdam, The Netherlands.


Stepwise selection methods are widely applied to identify covariables for inclusion in regression models. One of the problems of stepwise selection is biased estimation of the regression coefficients. We illustrate this "selection bias" with logistic regression in the GUSTO-I trial (40,830 patients with an acute myocardial infarction). Random samples were drawn that included 3, 5, 10, 20, or 40 events per variable (EPV). Backward stepwise selection was applied in models containing 8 or 16 pre-specified predictors of 30-day mortality. We found a considerable overestimation of regression coefficients of selected covariables. The selection bias decreased with increasing EPV. For EPV 3, 10, or 40, the bias exceeded 25% for 7, 3, and 1 in the 8-predictor model respectively, when a conventional selection criterion was used (alpha = 0.05). For these EPV values, the bias was less than 20% for all covariables when no selection was applied. We conclude that stepwise selection may result in a substantial bias of estimated regression coefficients.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center