Format

Send to

Choose Destination
Nat Genet. 2015 Mar;47(3):284-90. doi: 10.1038/ng.3190. Epub 2015 Feb 2.

Efficient Bayesian mixed-model analysis increases association power in large cohorts.

Author information

1
1] Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA. [2] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.
2
1] Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA. [2] Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. [3] Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA.
3
1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [2] Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA.
4
Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
5
1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [2] Department of Endocrinology, Children's Hospital Boston, Boston, Massachusetts, USA.
6
Division of Preventive Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA.
7
1] Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. [2] Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA.
8
Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.
9
1] Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA. [2] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [3] Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

Abstract

Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN(2)) (where N is the number of samples and M is the number of SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here we present a far more efficient mixed-model association method, BOLT-LMM, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to 9 quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for genome-wide association studies in large cohorts.

PMID:
25642633
PMCID:
PMC4342297
DOI:
10.1038/ng.3190
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center