Send to

Choose Destination
Bioinformatics. 2020 Mar 24. pii: btaa201. doi: 10.1093/bioinformatics/btaa201. [Epub ahead of print]

Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF.

Author information

Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, OH.
Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH.
Department of Biomedical Informatics, University of Cincinnati, Cincinnati, OH.



The rapid proliferation of single-cell RNA-Sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity has increased, most require significant user-tuning, are heavily reliant on dimension reduction techniques and are not scalable to ultra-large datasets. We previously described a multi-step algorithm, Iterative Clustering and Guide-gene selection (ICGS), which applies intra-gene correlation and hybrid clustering to uniquely resolve novel transcriptionally coherent cell populations from an intuitive graphical user interface.


We describe a new iteration of ICGS that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well-established benchmarks. This approach combines multiple complementary subtype detection methods (HOPACH, sparse-NMF, cluster "fitness", SVM) to resolve rare and common cell-states, while minimizing differences due to donor or batch effects. Using data from multiple cell atlases, we show that the PageRank algorithm effectively down-samples ultra-large scRNA-Seq datasets, without losing extremely rare or transcriptionally similar yet distinct cell-types and while recovering novel transcriptionally distinct cell populations. We believe this new approach holds tremendous promise in reproducibly resolving hidden cell populations in complex datasets.


ICGS2 is implemented in Python. The source code and documentation are available at:


Supplementary data are available at Bioinformatics online.

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center