Format

Send to

Choose Destination
J R Stat Soc Series B Stat Methodol. 2016 Jan;78(1):127-151. Epub 2015 Feb 15.

Semiparametric Estimation in the Secondary Analysis of Case-Control Studies.

Author information

1
Department of Statistics, University of South Carolina, Columbia, SC 29208; Department of Statistics, Texas A&M University, College Station, TX 77843.

Abstract

We study the regression relationship among covariates in case-control data, an area known as the secondary analysis of case-control studies. The context is such that only the form of the regression mean is specified, so that we allow an arbitrary regression error distribution, which can depend on the covariates and thus can be heteroscedastic. Under mild regularity conditions we establish the theoretical identifiability of such models. Previous work in this context has either (a) specified a fully parametric distribution for the regression errors, (b) specified a homoscedastic distribution for the regression errors, (c) has specified the rate of disease in the population (we refer this as true population), or (d) has made a rare disease approximation. We construct a class of semiparametric estimation procedures that rely on none of these. The estimators differ from the usual semiparametric ones in that they draw conclusions about the true population, while technically operating in a hypothetic superpopulation. We also construct estimators with a unique feature, in that they are robust against the misspecification of the regression error distribution in terms of variance structure, while all other nonparametric effects are estimated despite of the biased samples. We establish the asymptotic properties of the estimators and illustrate their finite sample performance through simulation studies, as well as through an empirical example on the relation between red meat consumption and heterocyclic amines. Our analysis verified the positive relationship between red meat consumption and two forms of HCA, indicating that increased red meat consumption leads to increased levels of MeIQA and PhiP, both being risk factors for colorectal cancer. Computer software as well as data to illustrate the methodology are available at http://wileyonlinelibrary.com/journal/rss-datasets.

KEYWORDS:

Biased samples; Case-control study; Heteroscedastic regression; Secondary analysis; Semiparametric estimation

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center