Send to

Choose Destination
Nat Commun. 2015 Jun 25;6:7432. doi: 10.1038/ncomms8432.

A random forest approach to capture genetic effects in the presence of population structure.

Author information

Cellular Networks and Systems Biology, University of Cologne, CECAD, Joseph-Stelzmann-Strasse 26, Cologne 50931, Germany.
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10SD, UK.


The accurate mapping of causal variants in genome-wide association studies requires the consideration of both, confounding factors (for example, population structure) and nonlinear interactions between individual genetic variants. Here, we propose a method termed 'mixed random forest' that simultaneously accounts for population structure and captures nonlinear genetic effects. We test the model in simulation experiments and show that the mixed random forest approach improves detection power compared with established approaches. In an application to data from an outbred mouse population, we find that mixed random forest identifies associations that are more consistent with prior knowledge than competing methods. Further, our approach allows predicting phenotypes from genotypes with greater accuracy than any of the other methods that we tested. Our results show that approaches that simultaneously account for both, confounding due to population structure and epistatic interactions, are important to fully explain the heritable component of complex quantitative traits.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Nature Publishing Group
Loading ...
Support Center