Format

Send to

Choose Destination
Nat Commun. 2018 Jan 18;9(1):284. doi: 10.1038/s41467-017-02554-5.

A general and flexible method for signal extraction from single-cell RNA-seq data.

Author information

1
Division of Biostatistics and Epidemiology, Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY, 10065, USA.
2
Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, 94720, USA.
3
Laboratoire de Probabilités et Modèles Aléatoires, Université Paris Diderot, 75005, Paris, France.
4
Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, 94720, USA. sandrine@stat.berkeley.edu.
5
Department of Statistics, University of California, Berkeley, CA, 94720, USA. sandrine@stat.berkeley.edu.
6
CBIO-Centre for Computational Biology, MINES ParisTech, PSL Research University, 75006, Paris, France. jean-philippe.vert@ens.fr.
7
Institut Curie, 75005, Paris, France. jean-philippe.vert@ens.fr.
8
INSERM U900, 75005, Paris, France. jean-philippe.vert@ens.fr.
9
Department of Mathematics and Applications, Ecole Normale Supérieure, 75005, Paris, France. jean-philippe.vert@ens.fr.

Abstract

Single-cell RNA-sequencing (scRNA-seq) is a powerful high-throughput technique that enables researchers to measure genome-wide transcription levels at the resolution of single cells. Because of the low amount of RNA present in a single cell, some genes may fail to be detected even though they are expressed; these genes are usually referred to as dropouts. Here, we present a general and flexible zero-inflated negative binomial model (ZINB-WaVE), which leads to low-dimensional representations of the data that account for zero inflation (dropouts), over-dispersion, and the count nature of the data. We demonstrate, with simulated and real data, that the model and its associated estimation procedure are able to give a more stable and accurate low-dimensional representation of the data than principal component analysis (PCA) and zero-inflated factor analysis (ZIFA), without the need for a preliminary normalization step.

PMID:
29348443
PMCID:
PMC5773593
DOI:
10.1038/s41467-017-02554-5
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center