Send to

Choose Destination
PeerJ. 2017 Jan 19;5:e2888. doi: 10.7717/peerj.2888. eCollection 2017.

Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization.

Author information

Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, United States; Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, United States.
Department of Genetics, Yale University , New Haven , CT , United States.
Epidemiology Program, University of Hawaii Cancer Center , Honolulu , HI , United States.


Single-cell RNA-Sequencing (scRNA-Seq) is a fast-evolving technology that enables the understanding of biological processes at an unprecedentedly high resolution. However, well-suited bioinformatics tools to analyze the data generated from this new technology are still lacking. Here we investigate the performance of non-negative matrix factorization (NMF) method to analyze a wide variety of scRNA-Seq datasets, ranging from mouse hematopoietic stem cells to human glioblastoma data. In comparison to other unsupervised clustering methods including K-means and hierarchical clustering, NMF has higher accuracy in separating similar groups in various datasets. We ranked genes by their importance scores (D-scores) in separating these groups, and discovered that NMF uniquely identifies genes expressed at intermediate levels as top-ranked genes. Finally, we show that in conjugation with the modularity detection method FEM, NMF reveals meaningful protein-protein interaction modules. In summary, we propose that NMF is a desirable method to analyze heterogeneous single-cell RNA-Seq data. The NMF based subpopulation detection package is available at:


Clustering; Feature gene; Heterogeneity; Modularity; Non-negative matrix factorization; RNA-Seq; Single cell; Single cell sequencing; Single-cell; Subpopulation

Supplemental Content

Full text links

Icon for PeerJ, Inc. Icon for PubMed Central
Loading ...
Support Center