Format

Send to

Choose Destination
Stat Interface. 2015;8(4):405-418. doi: 10.4310/SII.2015.v8.n4.a1.

Single-gene negative binomial regression models for RNA-Seq data with higher-order asymptotic inference.

Author information

1
Department of Statistics, Oregon State University, Corvallis, OR 97331, USA.

Abstract

We consider negative binomial (NB) regression models for RNA-Seq read counts and investigate an approach where such NB regression models are fitted to individual genes separately and, in particular, the NB dispersion parameter is estimated from each gene separately without assuming commonalities between genes. This single-gene approach contrasts with the more widely-used dispersion-modeling approach where the NB dispersion is modeled as a simple function of the mean or other measures of read abundance, and then estimated from a large number of genes combined. We show that through the use of higher-order asymptotic techniques, inferences with correct type I errors can be made about the regression coefficients in a single-gene NB regression model even when the dispersion is unknown and the sample size is small. The motivations for studying single-gene models include: 1) they provide a basis of reference for understanding and quantifying the power-robustness trade-offs of the dispersion-modeling approach; 2) they can also be potentially useful in practice if moderate sample sizes become available and diagnostic tools indicate potential problems with simple models of dispersion.

KEYWORDS:

92D20; Extra-Poisson variation; Higher-order asymptotics; Negative binomial; Overdispersion; Power-robustness; Primary 62P10; RNA-Seq; Regression

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center