Send to

Choose Destination
Stat Med. 1997 Jun 30;16(12):1377-89.

Re-using data from case-control studies.

Author information

Department of Statistics, University of Auckland, New Zealand.


Despite its ability to maximize statistical power while keeping data collection costs to a minimum, case-control sampling provides a non-representative sample of the population. When fitting a logistic regression model to data obtained in such a study, using the variable stratifying the population as the response, it is well known that the estimate of the constant term will be biased, but those of the coefficients of the covariates will not. However, subsequent to the case-control study, it is often desired to conduct a secondary analysis, using a variable that was previously a covariate in the main study as the response. If this new response is associated with the original variable used to stratify the population into cases and controls, a conventional logistic regression analysis will usually result in biased estimates of all the regression coefficients, not just the constant. This situation has recently been studied by Nagelkerke et al. who describe some situations where no bias occurs. In this paper we discuss how to calculate maximum likelihood estimates of all the regression coefficients, in the situation where the sampling rates for cases and controls are known. An example using data from the New Zealand Cot Death Study is presented.

[Indexed for MEDLINE]

Supplemental Content

Loading ...
Support Center