Send to

Choose Destination
Stat Biosci. 2013 Nov 1;5(2). doi: 10.1007/s12561-013-9080-2.

Using the Whole Cohort in the Analysis of Case-Control Data: Application to the Women's Health Initiative.

Author information

Department of Biostatistics, University of Washington, Seattle, WA, USA, Tel.: +1-206-543-1044.
Department of Statistics, University of Auckland, Auckland, NZ.
WHI Clinical Coordinating Center, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, Bethesda, MD, USA.


Standard analyses of data from case-control studies that are nested in a large cohort ignore information available for cohort members not sampled for the sub-study. This paper reviews several methods designed to increase estimation efficiency by using more of the data, treating the case-control sample as a two or three phase stratified sample. When applied to a study of coronary heart disease among women in the hormone trials of the Women's Health Initiative, modest but increasing gains in precision of regression coefficients were observed depending on the amount of cohort information used in the analysis. The gains were particularly evident for pseudo- or maximum likelihood estimates whose validity depends on the assumed model being correct. Larger standard errors were obtained for coefficients estimated by inverse probability weighted methods that are more robust to model misspecification. Such misspecification may have been responsible for an important difference in one key regression coefficient estimated using the weighted compared with the more efficient methods.


Logistic regression; calibration of sampling weights; maximum likelihood; model misspecification and survey sampling; pseudolikelihood

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center