Send to

Choose Destination
Front Genet. 2019 Aug 9;10:734. doi: 10.3389/fgene.2019.00734. eCollection 2019.

A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining.

Author information

University of Naples Federico II, Department of Chemical Materials and Industrial Engineering, Naples, Italy.
Telethon Institute of Genetics and Medicine, Naples, Italy.


Gene expression in individual cells can now be measured for thousands of cells in a single experiment thanks to innovative sample-preparation and sequencing technologies. State-of-the-art computational pipelines for single-cell RNA-sequencing data, however, still employ computational methods that were developed for traditional bulk RNA-sequencing data, thus not accounting for the peculiarities of single-cell data, such as sparseness and zero-inflated counts. Here, we present a ready-to-use pipeline named gf-icf (gene frequency-inverse cell frequency) for normalization of raw counts, feature selection, and dimensionality reduction of scRNA-seq data for their visualization and subsequent analyses. Our work is based on a data transformation model named term frequency-inverse document frequency (TF-IDF), which has been extensively used in the field of text mining where extremely sparse and zero-inflated data are common. Using benchmark scRNA-seq datasets, we show that the gf-icf pipeline outperforms existing state-of-the-art methods in terms of improved visualization and ability to separate and distinguish different cell types.


cell type; enrichment analysis; feature extraction; single-cell transcriptomics; term frequency–inverse document frequency

Supplemental Content

Full text links

Icon for Frontiers Media SA Icon for PubMed Central
Loading ...
Support Center