Estimating effect sizes of differentially expressed genes for power and sample-size assessments in microarray experiments

Shigeyuki Matsui; Hisashi Noma

doi:10.1111/j.1541-0420.2011.01618.x

Estimating effect sizes of differentially expressed genes for power and sample-size assessments in microarray experiments

Biometrics. 2011 Dec;67(4):1225-35. doi: 10.1111/j.1541-0420.2011.01618.x. Epub 2011 May 31.

Authors

Shigeyuki Matsui¹, Hisashi Noma

Affiliation

¹ Department of Data Science, The Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan. smatsui@ism.ac.jp

PMID: 21627629
DOI: 10.1111/j.1541-0420.2011.01618.x

Abstract

In microarray screening for differentially expressed genes using multiple testing, assessment of power or sample size is of particular importance to ensure that few relevant genes are removed from further consideration prematurely. In this assessment, adequate estimation of the effect sizes of differentially expressed genes is crucial because of its substantial impact on power and sample-size estimates. However, conventional methods using top genes with largest observed effect sizes would be subject to overestimation due to random variation. In this article, we propose a simple estimation method based on hierarchical mixture models with a nonparametric prior distribution to accommodate random variation and possible large diversity of effect sizes across differential genes, separated from nuisance, nondifferential genes. Based on empirical Bayes estimates of effect sizes, the power and false discovery rate (FDR) can be estimated to monitor them simultaneously in gene screening. We also propose a power index that concerns selection of top genes with largest effect sizes, called partial power. This new power index could provide a practical compromise for the difficulty in achieving high levels of usual overall power as confronted in many microarray experiments. Applications to two real datasets from cancer clinical studies are provided.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Bayes Theorem*
DNA / genetics*
Data Interpretation, Statistical*
Gene Expression Profiling / methods*
Oligonucleotide Array Sequence Analysis / methods*
Sample Size

Substances

DNA