Format

Send to

Choose Destination
Bioinformatics. 2020 Mar 24. pii: btaa201. doi: 10.1093/bioinformatics/btaa201. [Epub ahead of print]

Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF.

Author information

1
Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, OH.
2
Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH.
3
Department of Biomedical Informatics, University of Cincinnati, Cincinnati, OH.

Abstract

MOTIVATION:

The rapid proliferation of single-cell RNA-Sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity has increased, most require significant user-tuning, are heavily reliant on dimension reduction techniques and are not scalable to ultra-large datasets. We previously described a multi-step algorithm, Iterative Clustering and Guide-gene selection (ICGS), which applies intra-gene correlation and hybrid clustering to uniquely resolve novel transcriptionally coherent cell populations from an intuitive graphical user interface.

RESULTS:

We describe a new iteration of ICGS that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well-established benchmarks. This approach combines multiple complementary subtype detection methods (HOPACH, sparse-NMF, cluster "fitness", SVM) to resolve rare and common cell-states, while minimizing differences due to donor or batch effects. Using data from multiple cell atlases, we show that the PageRank algorithm effectively down-samples ultra-large scRNA-Seq datasets, without losing extremely rare or transcriptionally similar yet distinct cell-types and while recovering novel transcriptionally distinct cell populations. We believe this new approach holds tremendous promise in reproducibly resolving hidden cell populations in complex datasets.

AVAILABILITY AND IMPLEMENTATION:

ICGS2 is implemented in Python. The source code and documentation are available at: http://altanalyze.org.

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center