Format

Send to

Choose Destination
See comment in PubMed Commons below
Biostatistics. 2008 Oct;9(4):601-12. doi: 10.1093/biostatistics/kxm053. Epub 2008 Feb 27.

Efficient p-value estimation in massively parallel testing problems.

Author information

  • 1Department of Public Health Sciences, University of Toronto, Toronto, ON, Canada. r.kustra@utoronto.ca

Abstract

We present a new method to efficiently estimate very large numbers of p-values using empirically constructed null distributions of a test statistic. The need to evaluate a very large number of p-values is increasingly common with modern genomic data, and when interaction effects are of interest, the number of tests can easily run into billions. When the asymptotic distribution is not easily available, permutations are typically used to obtain p-values but these can be computationally infeasible in large problems. Our method constructs a prediction model to obtain a first approximation to the p-values and uses Bayesian methods to choose a fraction of these to be refined by permutations. We apply and evaluate our method on the study of association between 2-way interactions of genetic markers and colorectal cancer using the data from the first phase of a large, genome-wide case-control study. The results show enormous computational savings as compared to evaluating a full set of permutations, with little decrease in accuracy.

KEYWORDS:

Bayesian testing; Genome-wide association studies; Interaction effects; Permutation distribution; Random Forest; p-value distribution

[PubMed - indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for HighWire Icon for PubMed Central
    Loading ...
    Write to the Help Desk