NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM2370863 Query DataSets for GSM2370863
Status Public on Feb 20, 2017
Title 3A11
Sample type SRA
 
Source name CD8+ T cell
Organism Mus musculus
Characteristics strain: C57BL/6
subtype: Day7
library strategy: Single Cell RNAseq
Treatment protocol For the isolation of CD8+ T cells that had undergone their first cell division, 2 x 106 P14 CD8+ T cells were first labeled with carboxyfluorescein diacetate succinimidyl ester (CFSE) prior to adoptive transfer into recipient mice (n=24) and harvested 2 d post LCMV-Arm infection.
Growth protocol 5 x 103 P14 CD45.1+ CD8+ T cells were adoptively transferred into congenic wild-type CD45.2+ recipient mice, followed by intraperitoneal infection (i.p.) 1 day later with 2 x105 plaque forming units (pfu) per mouse of LCMV-Arm. Splenocytes were isolated from recipient mice at 7 d post-infection (n=4) and splenocytes and lymph nodes were harvested at 42 d (n=40) post-infection. For the isolation of CD8+ T cells at 4 d post infection, 5 x 104 P14 CD8+ T cells per mouse were adoptively transferred into 24 recipient mice.
Extracted molecule polyA RNA
Extraction protocol The C1 Single-Cell Auto Prep System (Fluidigm) was used to perform whole transcriptome amplification (WTA) of up to 96 single cells simultaneously. After cell isolation, FACS sorted 2.5 x 105 to 2 x 106 P14 CD8+ T cells were loaded onto the C1 Single-Cell Auto Prep mRNA Array IFC for single-cell capture on chip. Live/dead stain (Invitrogen) was included to exclude dead cells. Viable single cells captured on chip were manually imaged. Cell lysis and RT-PCR were performed on chip.
SMARTer chemistry (Clontech) WTA was performed according to the manufacturer’s instructions. Illumina Nextera XT single-cell complementary DNA (cDNA) libraries were generated according to the manufacturer’s instructions (Illumina). Quality control measures of the single-cell cDNA libraries were performed on the 2100 Bioanalyzer (Agilent Technologies), Qubit 3.0 Fluorometer (Thermo Fisher Scientific), and MiSeq Sequencing System (Illumina). Single-cell cDNA libraries were sequenced (paired-end 100 or single-end 100) on the HiSeq2500 Sequencing System at the UCSD Institute for Genomics Medicine (IGM) Center.
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina HiSeq 2500
 
Data processing Basecalls performed using CASAVA version 1.4
ChIP-seq reads were aligned to the mm10 genome assembly using STAR version 2.4.1b with the following options --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 1 --limitOutSJoneRead --outFilterScoreMin 3 --outFilterScoreMinOverLread 0.66 --outFilterMatchNminOverLread 0.66 --outFilterMismatchNoverLmax 0.3 and the other parameter kept as default
BAM files of sample1-11 are downsampled, if needed, using samtools to ensure sample 1-4, sample 5-8, sample 9-11 have same read numbers.
for sample 1-4: peaks were called using MACS version 1.4.2 using sample 5-8 as backgournd, with the following setting -g mm --nomodel --nolambda and others as default.
for sample 9-11: tag directories were first generated by HOMER makeTagDirectory with -keepOne -tbp 1 and other options as default.
for sample 9 and 11: peakd were call by HOMER findpeak using sample 12 as background and -style factor -size 100 -fragLength 50 with other options as default
for sample 10: peakd were call by HOMER findpeak using sample 12 as background and -style histone -fragLength 50 -size 200 -minDist 1000 -L 0 with other options as default
for sample 13-16: transcript count were generated by kallisto version 0.42.4 using GENCODE GRCm38.p4 transcriptome as reference, with the following parameter: -l 200 -s 20 --single and other setting as default
For all Single Cell Samples the following processing steps apply:
Single-cell RNA-seq data pre-processing. Single-cell mRNA sequencing data from 256 murine CD8+ T cells were processed with a bioinformatics pipeline focusing on quality control (QC) and robust expression quantification. For each cell, raw RNA-seq reads were: checked for quality metrics with fastqc (v0.10.1); poly-A and adaptor-trimmed with cutadapt (v1.8.1); quantified by kallisto (v0.42.1) to a reference transcriptome (Gencode vM3) without bias correction; and aligned by STAR (v2.4.1b) to the reference mouse genome (mm10) with default parameters for quality control and downstream analysis. Next, the transcript per million (TPM) outputs of kallisto for all cells were combined into a cell-by-gene expression matrix (C=288 cells=rows, G=22425 genes=columns) by summing the expression values for all quantified transcripts of a given gene. Finally, the TPM value for each cell c and gene g was natural log-transformed to yield a normalized expression value: EXPRc,g = ln(1+TPMc,g).
Dimensionality reduction and cell heterogeneity visualization. To reduce the dimensionality of the cell-by-gene expression matrix EXPR and visualize the diversity of gene expression among CD8+ T cells of different subtypes in a 2-dimensional scatter plot, we applied the t-distributed Stochastic Neighborhood Embedding (tSNE) algorithm via its Barnes-Hut approximation (bhSNE). We first applied standard Principal Components Analysis (PCA) to reduce the dimensionality down to D=10, and only then applied bhSNE to visualize in D=2 (with perplexity=30 and theta=0.75 parameters).
Differential gene expression analysis. We performed differential gene expression analysis between all pairs of T cell sub-populations from two non-overlapping sets of rows in the log-transformed expression matrix EXPR. Since single-cell gene expression does not conform to the usual negative binomial distribution and can even be bimodal due to dropout we used two non-parametric statistical tests for heterogeneity of expression: Mann Whitney Wilcoxon (MWW, also known as MWU) which is a rank-sum test but relies on a large sample to approximate normality, and Kolmogorov-Smirnov 2-sample (KS2) test which finds the largest difference between the empirical cumulative distributions, even between two small samples such as our 1st division sub-types Div1TE (n =36) and Div1MEM (n =24).
Cell type classifier. We trained two binary T cell classifiers to identify gene expression signatures that not only differentiate the examined T cell sub-populations (like the differential gene expression described above) but can also be used to predict the ‘memory-‘ or ‘effector-ness’ of previously unseen cells. Each classifier constructed an independent ensemble of Extremely Randomized Trees. Using the terminally differentiated effector and memory (TCM, TEM) populations, we built a training set for a fate classifier for CD8+ T cells. Using the newly observed segregation of daughter T cells into Div1TE and Div1MEM subpopulations after the first division, we built a second training set for another early state classifier. Both classifiers were fed their respective training sets using 10-fold cross-validation
After both the fate and early state classifiers were trained on their respective subpopulations, they were both applied on previously unseen intermediate Day 4 CD8+ T cells. Their predicted ‘memory-ness’ scores were scatter-plotted and shown to correlate in Fig. 3f. For each T cell, its ‘effector-ness’ scores is 1 minus the ‘memory-ness’ score and is redundant for this analysis. The signature genes for each classifier were selected from all G=22,425 genes by their GINI score.
Temporal expression trajectories through inferred lineage paths. To understand the temporal dynamics of expression for key genes along the effector and memory lineages, we constructed hypothetical differentiation time-courses for each lineage. Briefly, we sampled with replacement 50 cells from each population and constructed all trajectories through the cross-product of populations ordered in a particular lineage. These orders were determined a priori based on our earlier work with similar timecourses of RT-qPCR study (Arsenio et. al. Nature Immunology 2014). Specifically, the ‘effector lineage’ starts from the naïve population, and progresses through the Div1TE subpopulation, then onto Day 4, and finally Day 7. In contrast, the ‘effector memory’ and ‘central memory lineages’ start from naïve, through the Div1MEM subpopulation, ending with TEM and TCM respectively. These bootstrapped trajectories were visually summarized by a seaborn timeseries plot which links the average expression for each population sample with a solid line segment and represents the 95% confidence interval by a shaded area around it.
Genome_build: mm10
Supplementary_files_format_and_content: peak BED files generated from MACS and HOMER. Transcript Expression CSV tables generated from kallisto.
 
Submission date Nov 01, 2016
Last update date May 15, 2019
Contact name Gene Yeo
E-mail(s) geneyeo@ucsd.edu
Organization name UCSD
Street address 2880 Torrey Pines Scenic Dr. Room 3805/Yeo Lab
City La Jolla
State/province CA
ZIP/Postal code 92037
Country USA
 
Platform ID GPL17021
Series (1)
GSE89405 Early transcriptional and epigenetic regulation of CD8+ T cell differentiation revealed by single-cell RNA-seq
Relations
BioSample SAMN05963654
SRA SRX2315948

Supplementary data files not provided
SRA Run SelectorHelp
Processed data are available on Series record
Raw data are available in SRA

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap