Format

Send to

Choose Destination
Nat Biotechnol. 2014 Sep;32(9):896-902. doi: 10.1038/nbt.2931. Epub 2014 Aug 24.

Normalization of RNA-seq data using factor analysis of control genes or samples.

Author information

1
Department of Statistics, University of California, Berkeley, Berkeley, California, USA.
2
1] Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California, USA. [2] Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, California, USA. [3] Functional Genomics Laboratory, University of California, Berkeley, Berkeley, California, USA.
3
1] Department of Statistics, University of California, Berkeley, Berkeley, California, USA. [2] Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia. [3] Department of Mathematics and Statistics, The University of Melbourne, Victoria, Australia.
4
1] Department of Statistics, University of California, Berkeley, Berkeley, California, USA. [2] Division of Biostatistics, University of California, Berkeley, Berkeley, California, USA.

Abstract

Normalization of RNA-sequencing (RNA-seq) data has proven essential to ensure accurate inference of expression levels. Here, we show that usual normalization approaches mostly account for sequencing depth and fail to correct for library preparation and other more complex unwanted technical effects. We evaluate the performance of the External RNA Control Consortium (ERCC) spike-in controls and investigate the possibility of using them directly for normalization. We show that the spike-ins are not reliable enough to be used in standard global-scaling or regression-based normalization procedures. We propose a normalization strategy, called remove unwanted variation (RUV), that adjusts for nuisance technical effects by performing factor analysis on suitable sets of control genes (e.g., ERCC spike-ins) or samples (e.g., replicate libraries). Our approach leads to more accurate estimates of expression fold-changes and tests of differential expression compared to state-of-the-art normalization methods. In particular, RUV promises to be valuable for large collaborative projects involving multiple laboratories, technicians, and/or sequencing platforms.

PMID:
25150836
PMCID:
PMC4404308
DOI:
10.1038/nbt.2931
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center