GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM2871419

Query DataSets for GSM2871419

Status

Public on Dec 01, 2017

Title

CSCC_CT-DANCR_W54-1-1-C.03

Sample type

SRA

Source name

glioblastoma

Organism

Homo sapiens

Characteristics

study: cancer stem cell clusters
structure acronym: CT-DANCR
tumor_name: W54-1-1
block_name: W54-1-1-C
specimen_name: W54-1-1-C.03
rna_well_id: 304950411
rin: 6.1
gene-level raw fpkm profile file_download_link: http://glioblastoma.alleninstitute.org/api/v2/well_known_file_download/309083378

Treatment protocol

none

Growth protocol

none

Extracted molecule

total RNA

Extraction protocol

(Laser Microdissection) In preparation for laser microdissection, PEN slides were removed from -80°C and quickly processed through cresyl violet and Eosin to lightly stain the tissue. Sections were fixed in ice-cold 70% ethanol for 30 seconds, washed for 15 seconds in nuclease-free water, stained with 0.7% cresyl violet in 0.05% NaOAc, pH 3.4 for 4 minutes, rinsed in nuclease-free water for 10 seconds, 15 seconds in 70% ethanol, followed by 2 dips in 0.25% Eosin, and 20 seconds each in 95%, 100%, and 100% ethanol rinses. Slides were air-dried for 2 minutes and desiccated by vacuum for 1 hour at room temperature, then frozen at -80°C until microdissection. For both RNA-Seq studies, cresyl violet/Eosin-stained sections mounted on PEN membranes were microdissected while visually referring to H&E-stained sections that had been curated to identify matched target regions. For the Cancer Stem Cells RNA-Seq study, ISH reference gene expression patterns informed the curation of the H&E-stained sections. A Leica LMD6000 (Leica Microsystems, Inc., Bannockburn, IL) was used for microdissection, and the system included an upright research microscope fitted with a diode laser and a CCD camera to acquire live images of slides. The scope and laser were controlled via a dedicated computer running Leica LMD software (v.6.6.2.3552). (RNA Isolation and RNA Integrity) Microdissected tissue was collected directly into RLT buffer from the RNeasy Micro PLUS kit (Qiagen Inc., Valencia, CA) with 1:100 beta-mercaptoethanol diluation, per manufacturer’s instructions. Samples were volume-adjusted with water to 75μl, vortexed, centrifuged, and frozen at -80°C. RNA samples, after following the manufacturer’s directions, were eluted in 14μl, and 1μl was run on the Agilent 2100 Bioanalyzer (Agilent Technologies, Inc., Santa Clara, CA) using the Pico assay. Due to low sample volume and incompatibility of the eluent with the Nanodrop spectrophotometer (Thermo Scientific, Wilmington, DE), samples were quantitated using the Bioanalyzer concentration output. This was done by running a 1ng/μl RNA standard on the same Pico chip and then dividing the sample concentration output by the output of the standard concentration. The average RNA Integrity Number (RIN) of all passed samples was 7.1. Samples were failed when the Bioanalyzer traces showed degraded 18S and 28S bands, with RINs typically lower than 4.5 failing.
In most cases, 5ng of total RNA was used as the input amount for the library prep. 5ng total RNA was used as input into CloneTech SMARTer Ultra Low Input RNA Kit for Illumina Sequencing-HV (# 634820). 12 PCR cycles were used for amplification as suggested in the manufacturer’s instructions (ClonTech SMARTer Kit Manual 120213). The Modified Nextera DNA sample preparation was used after step V.B of the CloneTech SMARTer kit, instead of Covaris shearing and instead of step VI in the CloneTech SMARTer kit. RNA Sequencing was done on Illumina HiSeq 2000, producing approximately 30M 50bp paired-end clusters per sample. In most cases, 5 samples per lane were run. RNA-Seq libraries were assessed for quality by yield and visual inspection of the presence, quality, and size of cDNA product on a Bioanalyzer. Initially, 11 samples failed (7 for no product and 4 for majority of product <500bp) the quality control criteria. However, upon a second attempt at synthesis, all 275 samples passed. One of the 275 samples was failed for low inter-array-correlation (IAC) and was excluded from the data set. The average concentration of samples that passed the criteria was 1582 pM.

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

Illumina HiSeq 2000

Description

Cellular Tumor sampled by high expression of gene DANCR
304950411

Data processing

(RNA-Seq Data Alignment) The data generation, collection, alignment, and normalization is described in detail online at glioblastoma.alleninstitute.org in the Documentation tab. Raw read (fastq) files were aligned to the hg19 human genome sequence (Meyer et al., 2013) with the RefSeq transcriptome version 54 (downloaded 8/25/2012 and updated by removing duplicate gene entries from the gtf reference file for consistency with the LIMS). For alignment, Illumina sequencing adapters were clipped from the reads using the fastqMCF program (Aronesty, 2011).
After clipping, the paired-end reads were mapped using RNA-Seq by Expectation-Maximization (RSEM) (Li et al., 2010) using default settings except for two mismatch parameters: bowtie-e (set to 500) and bowtie-m (set to 100). RSEM aligns reads to known isoforms and then calculates gene expression as the sum of isoform expression for a given gene, assigning ambiguous reads to multiple isoforms using a maximum likelihood statistical model. Reads that did not map to the transcriptome were then aligned to the hg19 genome sequence using Bowtie with default settings (Langmead et al., 2009). Reads that mapped to neither the transcriptome with RSEM nor to the genome with Bowtie were mapped against the ERCC sequences (in this project as a negative control).
The final results files included quantification of the mapped reads (raw read counts, FPKM, and TPM values for the transcriptome-mapped reads, chromosome-wide counts for the genomic-mapped reads), BAM files including both transcriptome and genome-mapped reads, and fastq files for the unmapped reads. Anonymized BAM files (where sequence-level information has been removed) and gene-level quantification (TPM, FPKM, and number of reads) are available as part of the resource (see Download tab). Resulting FPKM values (normalized for gene length and sequencing depth) used for the analyses of this paper were further adjusted for the total transcript count using TbT normalization as described below.
(RNA-Seq Data Normalization) In the Allen Human Brain Atlas, analysis of the RNA-Seq data showed minimal process batch effects but improvements in variability after normalization could be made (Miller et al., 2014), and therefore a comparable post-hoc data normalization strategy was used for this project. Gene expression values were summarized as transcripts per million (TPM) and fragments per kilobase per million (FPKM), as described above, both of which normalize read counts by gene length and for the total number of reads in slightly different ways.
In the manuscript (as well as for the heatmaps shown on the website), the FPKM data matrix was further adjusted for the total transcript count using TbT normalization (Kadota et al., 2012), which scales each sample based on the summed expression of all genes that are not differentially expressed. FPKM values were TbT normalized in linear space, with the differential expression vector defined as TRUE if a sample was from cellular tumor and FALSE otherwise.
Sample data was then scaled such that the total log2(FPKM) across the entire data set remained unchanged after normalization. The result of this step was that expression levels for all genes in a particular sample were multiplied by a scalar value close to 1 (in most cases between 0.7-1.3).
Genome_build: GRCh37.p5
Supplementary_files_format_and_content: fpkm_table.csv contains the (row, column) matrix of fpkm values obtained for each (gene, sample). The first row contains the sample unique identifiers (rna_well_id). The first column contains the gene unique identifiers (gene_id).
Supplementary_files_format_and_content: columns-samples.csv contains information about the samples profiled with RNA sequencing: (1) tumor_id, block_id, specimen_id and corresponding names: Specimen from which the sample was dissected and the specimen's parent block and tumor. (2) rna_well_id: Unique identifier of the sample. (3) polygon_id: Unique identifier of an avg_graphic_object that outlines where the sample was cut from. (4) structure_id, structure_abbreviation, structure_color, structure_name: Label that groups samples obtained from laser micro-dissected anatomic structures or putative cancer stem cell clusters.
Supplementary_files_format_and_content: rows-genes.csv contains information about the genes for which fpkm values were calculated. (1) gene_id: Unique identifier for the gene. (2) chromosome: Chromosome associated with the gene. (3) gene_entrez_id, gene_symbol, gene_name: entrez_id, NCBI symbol, and name of the gene.

Submission date

Nov 30, 2017

Last update date

Dec 01, 2017

Contact name

Ralph B. Puchalski

E-mail(s)

rbpuchalski@gmail.com

Organization name

Swedish Neuroscience Institute

Street address

550 17th Ave., Suite 570