Expression profiling by high throughput sequencing
A multitude of single-cell RNA sequencing methods have been developed in recent years, with dramatic advances in scale and power, and enabling major discoveries and large scale cell mapping efforts. However, these methods have not been systematically and comprehensively benchmarked. Here, we directly compare seven methods for single cell and/or single nucleus profiling from three types of samples – cell lines, peripheral blood mononuclear cells and brain tissue – generating 36 libraries in six separate experiments in a single center. To analyze these datasets, we developed and applied scumi, a flexible computational pipeline that can be used for any scRNA-seq method. We evaluated the methods for both basic performance and for their ability to recover known biological information in the samples. Our study will help guide experiments with the methods in this study as well as serve as a benchmark for future studies and for computational algorithm development.
We systematically and directly compared seven single cell RNA-sequencing methods, including two low-throughput plate-based methods (Smart-seq2 and CEL-Seq2) and five high-throughput methods (10x Chromium (v2, v3), Drop-seq, Seq-Well, inDrops, and sci-RNA-seq), producing expression profiles from ~92,000 cells (nuclei) overall. We tested three sample types – a mixture of human and mouse cell lines, human peripheral blood mononuclear cells (PBMCs), and mouse cortex, each sample with two replicates – to generate a total of 36 different single cell RNA-sequencing libraries. For mouse cortex, we tested four single nucleus RNA-sequencing methods (Smart-seq2, 10x Chromium (v2), DroNc-seq, and sci-RNA-seq). We tested each sample type in two experiments (Mixture1 and Mixture2, PBMC1 and PBMC2, Cortex1 and Cortex2) run on different days to assess reproducibility. In each comparison experiment, we started with one sample with processing of aliquots starting at the same time for each method. The only exceptions were for Seq-Well in PBMC1, in which we thawed an identical PBMC aliquot a second time to obtain a Seq-Well dataset with sufficient cells profiled for PBMCs, and for 10x Chromium in PBMC1, in which we thawed an identical aliquot to directly compare version 2 (v2) with version 3 (v3). In each experiment, we aimed to collect data from ~350 cells for the low-throughput methods and ~3,000 cells for the high-throughput methods. In each experiment, we also used an aliquot of cells to generate a bulk RNA-sequencing library as a control. We sequenced all libraries together in an attempt to avoid batch effects due to varying sequence quality among Illumina flowcell lanes, with the following exceptions. We sequenced the inDrops libraries separately because they have an opposite read structure from those generated with the other methods. We performed additional sequencing for some libraries in an attempt to sequence similar numbers of reads per cell for each low or high throughput method. We aimed for 50,000 to 100,000 reads per cell for high-throughput methods and 750,000 to 1,000,000 reads per cell for low-throughput methods. The scRNA-seq FASTQ files are named with sample names, Illumina flowcell lanes, and library preparing methods. Different fields are separated by dots (.), for example, PBMC2.CC86JANXX.011818-DropSeq.unmapped.1.fastq.gz, where PBMC2 is the sample name (can be Mixture1, Mixture2, PBMC1, PBMC2, Cortex1, and Cortex2), CC86JANXX is the flowcell lanes, 011818 is the library preparation date, and DropSeq is the RNA-seq method (Drop-seq in this case, can be SM2, CELseq, 10X, DropSeq, DroNcSeq, SeqWell, inDrops, and SciSeq). This FASTQ file (read 1) includes cell barcodes and UMI information. The corresponding cDNA reads are in PBMC2.CC86JANXX.011818-DropSeq.unmapped.2.fastq.gz (read2). For more information about the structures of reads from different protocols, please see Supplementary Table 11 of the manuscript. FASTQ files with the same sample name and library preparation method but different flowcell lanes are from the same library but sequenced at different times, e.g., PBMC2.CCLBDANXX.68.011818-DropSeq.unmapped.1.fastq.gz. Therefore, these FASTQ files can be merged together for analyses. For 10x Chromium data, the reads from the same library are split into four files (10X_A, 10X_B, 10X_C, and 10X_D) and the reads from these files can also be merged together.