NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM1832545 Query DataSets for GSM1832545
Status Public on Mar 28, 2016
Title S46.D3
Sample type SRA
 
Source name Cerebral Cortex
Organism Homo sapiens
Characteristics tissue: Cerebral Cortex
developmental stage: GW23.5
Extracted molecule polyA RNA
Extraction protocol Single-cell capture and cell lysis using Fluidigm C1.
RT and whole transcriptome amplifcation on the Fluidigm C1 IFC; library indexed by illumina Nextera DNA Sample Prep Kit
 
Library strategy ncRNA-Seq
Library source transcriptomic
Library selection size fractionation
Instrument model Illumina HiSeq 2500
 
Description polyA RNA
single cell
Data processing Bulk RNA-seq: Strand-specific reads were aligned to the human reference genome, Ensembl GRCh37/hg19 release 75, using TopHat v2.0.10 with the flags (--library-type fr-firststrand –microexon-search). De novo transcriptome assembly was performed separately on rRNA depletion total RNA-seq alignments, and on polyA selection RNA-seq alignments, using Cufflinks v2.2.1 with the flags (-M ensembl_75_mtRNA_rRNA.gtf -b genome.fa -u --library-type fr-firststrand --max-multiread-fraction 0.25 --3-overhang-tolerance 2000). Transcriptome assemblies at all developmental stages and replicates were merged, separately for rRNA depletion total RNA-seq and polyA selection RNA-seq, with the Ensembl 75/GENCODE 19 reference transcriptome, using Cuffmerge. To identify transcripts novel compared to Ensembl, we utilized Cuffcompare class codes and extracted those assembled transcripts classified as: i – novel intronic, u – novel intergenic, x – novel antisense. All novel transcripts under 200 nt in length were removed. Of the remaining transcripts, we determined minimal read coverage thresholds based on whether Cufflinks classified previously annotated transcripts as having “full_read_support.” By analyzing the true positive rate vs. false positive rate of classifying known genes as obtaining “full_read_support” at various coverage thresholds, we determined the minimum coverage to be 1.4 for polyA and 1.67 for total RNA-seq (at FDR = 0.05). Starting with just the polyA RNA-seq data, transcripts with read coverage above 1.4 in both biological replicates of at least one developmental stage were included in the reference and considered to be expressed in the neocortex. Due to limited availability of early fetal tissue, the GW14.5 sample was treated as the biological duplicate of the GW13 sample. Novel transcripts that were predicted to have protein coding capability by one or more of the following methods were classified as transcripts of uncertain coding potential (TUCP): CPAT, threshold = 0.364; CPC, threshold = 0; Pfam. For comparing to the Pfam database, the longest potential open reading frame (ORF) of each novel transcript was obtained, and any putative ORF that had a significant match for a protein domain annotated in Pfam A or Pfam B resulted in the parent transcript being classified as a TUCP. All remaining novel lncRNAs and TUCPs were then named according to recently proposed nomenclature standards, for instance LINC-[nearest mRNA] for intergenic lncRNAs and [nearest mRNA]-AS for antisense lncRNAs, and were then merged to the Ensembl 75 reference transcriptome, resulting in the polyA Full reference transcriptome. The polyA Stringent reference transcriptome was produced by removing all novel single-exon lncRNAs and TUCPs. Known lncRNAs from Ensembl were obtained by identifying transcripts with one of the following biotype classifications: “3prime_overlapping_ncrna,” “antisense,” “lincRNA,” “processed_transcript,” “sense_intronic,” and “sense_overlapping.” The same pipeline, with the coverage threshold of 1.67, was performed for reads derived from the total RNA-seq. Gene-level fragment counts for each polyA and total RNA sample were quantified using featureCounts v1.4.6, using the flags: -p -s 2 -B -C -t exon -g gene_id. Count tables were normalized to TPM (Transcripts per Million) for internal comparisons and visualizations of bulk RNA-seq. To identify differentially expressed genes, we used DESeq2 on gene-level fragment counts derived from the polyA samples and polyA Full reference transcriptome. Pairwise negative binomial significance tests were performed between developmental stages using biological duplicates, and the union of genes that were significant at FDR < 0.01 were classified as differentially expressed.
Single Cell RNA-seq: Paired end 100 reads from single cell cDNA libraries were quality trimmed using Trim Galore with the flags: -q 20 --nextera --length 20. Trimmed reads were aligned to the human reference genome, Ensembl GRCh37/hg19 release 75, augmented with the 92 ERCC Spike-In Control sequences, using TopHat v2.0.10 with the flags: --transcriptome-index=polya_stringent_reference.gtf --prefilter-multihits. The polyA Stringent reference transcriptome, derived from whole tissue RNA-seq as described above, was used as a transcriptome guide. Gene-level fragment counts were quantified using featureCounts v1.4.6 with the flags: -p -B -C. Counts were normalized by transcriptome size factors according to DESeq. 50 additional single cell libraries were also included, which were deposited in SRP041736.
 
Submission date Jul 24, 2015
Last update date May 15, 2019
Contact name John Liu
E-mail(s) john.liu@ucsf.edu
Organization name UCSF
Street address 35 Medical Center Way
City SAN FRANCISCO
State/province California
ZIP/Postal code 94143
Country USA
 
Platform ID GPL16791
Series (1)
GSE71315 Single cell analysis of long non-coding RNAs in the developing human neocortex
Relations
BioSample SAMN03922300
SRA SRX1117522

Supplementary data files not provided
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap