Send to

Choose Destination
Genet Epidemiol. 2009 Apr;33(3):256-65. doi: 10.1002/gepi.20377.

Proper analysis of secondary phenotype data in case-control association studies.

Author information

Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599-7420, USA.


Case-control association studies often collect extensive information on secondary phenotypes, which are quantitative or qualitative traits other than the case-control status. Exploring secondary phenotypes can yield valuable insights into biological pathways and identify genetic variants influencing phenotypes of direct interest. All publications on secondary phenotypes have used standard statistical methods, such as least-squares regression for quantitative traits. Because of unequal selection probabilities between cases and controls, the case-control sample is not a random sample from the general population. As a result, standard statistical analysis of secondary phenotype data can be extremely misleading. Although one may avoid the sampling bias by analyzing cases and controls separately or by including the case-control status as a covariate in the model, the associations between a secondary phenotype and a genetic variant in the case and control groups can be quite different from the association in the general population. In this article, we present novel statistical methods that properly reflect the case-control sampling in the analysis of secondary phenotype data. The new methods provide unbiased estimation of genetic effects and accurate control of false-positive rates while maximizing statistical power. We demonstrate the pitfalls of the standard methods and the advantages of the new methods both analytically and numerically. The relevant software is available at our website.

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center