Format

Send to

Choose Destination
Am J Hum Genet. 2014 May 1;94(5):662-76. doi: 10.1016/j.ajhg.2014.03.016. Epub 2014 Apr 17.

Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies.

Author information

1
Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA. Electronic address: haschard@hsph.harvard.edu.
2
Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA; Medical and Population Genetics Program, Broad Institute, Cambridge, MA 02142, USA.
3
Sorbonne Universités, UPMC Univ Paris 06, UMR_S 1166, 75005 Paris, France; INSERM, UMR_S 1166, Genomics and Physiopathology of Cardiovascular Diseases, 75013 Paris, France; Institute for Cardiometabolism and Nutrition (ICAN), 75013 Paris, France.
4
Aix-Marseille Université, INSERM UMR_S 1062, 13385 Marseille, France.
5
Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA.

Abstract

Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated with one or more of the traits. Principal component analysis (PCA) is a useful tool that has been widely used for the multivariate analysis of correlated variables. PCA is usually applied as a dimension reduction method: the few top principal components (PCs) explaining most of total trait variance are tested for association with a predictor of interest, and the remaining components are not analyzed. In this study we review the theoretical basis of PCA and describe the behavior of PCA when testing for association between a SNP and correlated traits. We then use simulation to compare the power of various PCA-based strategies when analyzing up to 100 correlated traits. We show that contrary to widespread practice, testing only the top PCs often has low power, whereas combining signal across all PCs can have greater power. This power gain is primarily due to increased power to detect genetic variants with opposite effects on positively correlated traits and variants that are exclusively associated with a single trait. Relative to other methods, the combined-PC approach has close to optimal power in all scenarios considered while offering more flexibility and more robustness to potential confounders. Finally, we apply the proposed PCA strategy to the genome-wide association study of five correlated coagulation traits where we identify two candidate SNPs that were not found by the standard approach.

PMID:
24746957
PMCID:
PMC4067564
DOI:
10.1016/j.ajhg.2014.03.016
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Elsevier Science Icon for PubMed Central
Loading ...
Support Center