Format

Send to

Choose Destination
Stat Med. 2016 Nov 10;35(25):4546-4558. doi: 10.1002/sim.7021. Epub 2016 Jun 30.

Too many covariates and too few cases? - a comparative study.

Author information

1
Department of Biostatistics, School of Medicine, Vanderbilt University, Nashville, 37232, TN, U.S.A.. cindy.chen@vanderbilt.edu.
2
Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, 37232, TN, U.S.A.. cindy.chen@vanderbilt.edu.
3
Department of Biostatistics, School of Medicine, Vanderbilt University, Nashville, 37232, TN, U.S.A.
4
Department of Medicine, School of Medicine, Vanderbilt University, Nashville, 37232, TN, U.S.A.
5
Department of Health Policy, School of Medicine, Vanderbilt University, Nashville, 37232, TN, U.S.A.
6
Medicine Mid-South Geriatric Research Education and Clinical Center and Clinical Research Center of Excellence, VA TN Valley Health Care System, Nashville, 37232, TN, U.S.A.

Abstract

Prior research indicates that 10-15 cases or controls, whichever fewer, are required per parameter to reliably estimate regression coefficients in multivariable logistic regression models. This condition may be difficult to meet even in a well-designed study when the number of potential confounders is large, the outcome is rare, and/or interactions are of interest. Various propensity score approaches have been implemented when the exposure is binary. Recent work on shrinkage approaches like lasso were motivated by the critical need to develop methods for the p >> n situation, where p is the number of parameters and n is the sample size. Those methods, however, have been less frequently used when p≈n, and in this situation, there is no guidance on choosing among regular logistic regression models, propensity score methods, and shrinkage approaches. To fill this gap, we conducted extensive simulations mimicking our motivating clinical data, estimating vaccine effectiveness for preventing influenza hospitalizations in the 2011-2012 influenza season. Ridge regression and penalized logistic regression models that penalize all but the coefficient of the exposure may be considered in these types of studies.

KEYWORDS:

lasso; logistic regression model; over-parameterization; propensity score; ridge

PMID:
27357163
PMCID:
PMC5050102
DOI:
10.1002/sim.7021
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center