GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
Series GSE77288 Query DataSets for GSE77288
Status Public on Jul 08, 2016
Title Batch effects and the effective design of single-cell gene expression studies
Organism Homo sapiens
Experiment type Expression profiling by high throughput sequencing
Summary Single cell RNA sequencing (scRNA-seq) can be used to characterize variation in gene expression levels at high resolution. However, the sources of experimental noise in scRNA-seq are not yet well understood. We investigated the technical variation associated with sample processing using the single cell Fluidigm C1 platform. To do so, we processed three C1 replicates from three human induced pluripotent stem cell (iPSC) lines. We added unique molecular identifiers (UMIs) to all samples, to account for amplification bias. We found that the major source of variation in the gene expression data was driven by genotype, but we also observed substantial variation between the technical replicates. We observed that the conversion of reads to molecules using the UMIs was impacted by both biological and technical variation, indicating that UMI counts are not an unbiased estimator of gene expression levels. Based on our results, we suggest a framework for effective scRNA-seq studies.
Overall design We collected single cell RNA-seq (scRNA-seq) data from three YRI iPSC lines using the Fluidigm C1 microfluidic system followed by sequencing. We added ERCC spike-in controls to each sample, and used 5-bp random sequence UMIs to allow for the direct quantification of mRNA molecule numbers. For each of the YRI lines, we performed three independent C1 collections; each replicate was accompanied by processing of a matching bulk sample using the same reagents. This study design allows us to estimate error and variability associated with the technical processing of the samples, independently from the biological variation across single cells of different individuals. We were also able to estimate how well scRNA-seq data can recapitulate the RNA-seq results from population bulk samples.

We combined the 96 single cell samples from each C1 chip into their own master mix and sequenced across three lanes of a HiSeq 2500 (3 individuals x 3 replicates x 96 wells x 3 lanes = 2592 files). We prepared two separate library preparations for each bulk sample, combined them all into one master mix, and sequenced across four lanes (3 individuals x 3 replicates x 2 library preparations x 4 lanes = 72 files).
Web link
Contributor(s) Tung P, Blischak JD, Hsiao C, Knowles DA, Burnett J, Pritchard JK, Gilad Y
Citation(s) 28045081
Submission date Jan 27, 2016
Last update date May 15, 2019
Contact name John D Blischak
Organization name University of Chicago
Department Human Genetics
Lab Gilad
Street address 920 E. 58th Street, CLSC 317
City Chicago
State/province IL
ZIP/Postal code 60615
Country USA
Platforms (1)
GPL16791 Illumina HiSeq 2500 (Homo sapiens)
Samples (873)
GSM2047323 NA19098-r1-A01
GSM2047324 NA19098-r1-A02
GSM2047325 NA19098-r1-A03
BioProject PRJNA309972
SRA SRP068957

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE77288_molecules-raw-single-per-lane.txt.gz 20.0 Mb (ftp)(http) TXT
GSE77288_molecules-raw-single-per-sample.txt.gz 7.9 Mb (ftp)(http) TXT
GSE77288_reads-raw-bulk-per-lane.txt.gz 1.6 Mb (ftp)(http) TXT
GSE77288_reads-raw-bulk-per-sample.txt.gz 323.1 Kb (ftp)(http) TXT
GSE77288_reads-raw-single-per-lane.txt.gz 30.6 Mb (ftp)(http) TXT
GSE77288_reads-raw-single-per-sample.txt.gz 12.4 Mb (ftp)(http) TXT
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap