Format

Send to

Choose Destination
BMC Med Res Methodol. 2016 Nov 24;16(1):163.

No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.

Author information

1
Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, The Netherlands. M.vanSmeden@umcutrecht.nl.
2
Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, The Netherlands.
3
Centre for Statistics in Medicine, Botnar Research Centre, University of Oxford, Oxford, UK.

Abstract

BACKGROUND:

Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies.

METHODS:

The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared.

RESULTS:

The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation.

CONCLUSIONS:

The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.

KEYWORDS:

Bias; EPV; Logistic regression; Sample size; Separation; Simulations

PMID:
27881078
PMCID:
PMC5122171
DOI:
10.1186/s12874-016-0267-3
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center