Format

Send to

Choose Destination
Genomics. 2019 Feb 22. pii: S0888-7543(18)30580-9. doi: 10.1016/j.ygeno.2019.02.014. [Epub ahead of print]

Screen technical noise in single cell RNA sequencing data.

Author information

1
Dept. of Global Biostatistics and Data Science, Tulane University School of Public Health and Tropical Medicine, United States.
2
Dept. of Pathology, Tulane Cancer Center, Tulane University Health Sciences Center, United States.
3
Dept. of Pathology, Tulane Cancer Center, Tulane University Health Sciences Center, United States. Electronic address: hnakhoul@tulane.edu.
4
Dept. of Global Biostatistics and Data Science, Tulane University School of Public Health and Tropical Medicine, United States. Electronic address: yliu8@tulane.edu.

Abstract

We proposed a data cleaning pipeline for single cell (SC) RNA-seq data, where we first screen genes (gene-wise screening) followed by screening cell libraries (library-wise screening). Gene-wise screening is based on the expectation that for a gene with a low technical noise, a gene's count in a library will tend to increase with the increase of library size, which was tested using negative binomial regression of gene count (as dependent variable) against library size (as independent variable). Library-wise screening is based on the expectation that across-library correlations for housekeeping (HK) genes is expected to be higher than the correlations for non-housekeeping (NHK) genes in those libraries with low technical noise. We removed those libraries, whose mean pairwise correlation for HK genes is NOT significantly higher than that for NHK genes. We successfully applied the pipeline to two large SC RNA-seq datasets. The pipeline was also developed into an R package.

KEYWORDS:

Housekeeping genes; Next generation sequencing; QC; SCQC; Single cell RNA-seq

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center