GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM3273982

Query DataSets for GSM3273982

Status

Public on Jul 18, 2018

Title

ERX173584

Sample type

SRA

Source name

E-MTAB-1364:RBF_3-2

Organism

Drosophila melanogaster

Characteristics

attributes: s2

Extracted molecule

total RNA

Extraction protocol

see original sample

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

Illumina Genome Analyzer IIx

Data processing

We created a pre-alignment pipeline to identify technical metadata and generate sample quality metrics. We downloaded FASTQs from SRA using fastq-dump (sra-tools v2.8.2) --split-files -M 0, and counted the number of reads and estimated average read lengths. A sample was considered paired end if two files were generated by fastq-dump and each file had an equal number of reads, ≥ 10,000 reads, and an average read length ≥ 10 bp. We filtered individual reads that were ≤ 25 bp using atropos (v1.1.18) with --minimum-length 25. We simultaneously verified samples were indeed Drosophila and estimated contamination with FastQ Screen (v0.11.3) and bowtie 2 (v2.3.3.1); by mapping 100,000 reads to 8 references (dm6, rRNA, wolbachia, human, yeast, e. coli, PhiX, ERCC-SRM2374). Next we aligned all reads with Hisat2 (v2.1.0) with --max_intronlen 300000 and --known-splicesite-file to the Drosophila melanogaster Release 6 plus ISO1 MT (GCA_000001215.4). This was followed with samtools (v1.7) and bamtools (v2.4.1) with default settings to generate summary statistics. We estimated various metrics with Picard CollectRNASeqMetrics (v2.15.0) using three separate parameters STRAND=NONE, STRAND=FIRST_READ_TRANSCRIPTION_STRAND, and STRAND=SECOND_READ_TRANSCRIPTION_STRAND. These metrics allowed us estimate library strandedness. Finally we identified duplicates using Picard MarkDuplicates (v2.15.0).
To generate counts tables and coverage tracks we used parameters discovered in the pre-alignment pipeline in our alignment pipeline. The alignment pipeline uses FASTQ file(s) downloaded by the pre-alignment pipeline, but trimms adapter sequence and low quality bases using atropos (v1.1.18) with -q 20 --minimum-length 25. The remaining reads were mapped using Hisat2 (v2.1.0) with --dta --max-intronlen 300000 --known-splicesite-infile and the --rna-strandedness using ‘F’, ‘R’, ‘FR’, or ‘RF’ depending on the strandedness. We merged alignments from individual SRA runs (SRRs) to the library level (SRX) and generated gene level, junction level, and intergenic coverage counts using FeatureCounts from the subread package (v1.5.3). Finally we created browser tracks using bamCoverage from the deeptools package (v2.5.4) using --binSize 1 --normalizeTo1x 129000000 --ignoreForNormalization chrX.
Genome_build: Drosophila melanogaster Release 6 plus ISO1 MT (GeneBank assembly accession: GCA_000001215.4)
Supplementary_files_format_and_content: Processed data files include:
*.bw are BigWig files generated using deeptools bamCoverage
*.counts are gene level coverage counts
*.jcounts are gene level junction counts
*.intergenic.counts are intergenic coverage counts
*.intergenic.jcounts are intergenic junction counts
Series level supplementary files:
dmel_r6-11.intergenic.gtf intergenic GTF generated by the pipeline for estimating intergenic coverage counts.
supplemental_metadata.tsv supplemental metadata file containing additional metadata for each sample including QC values and various flags generated by each pipeline
gene_counts.tsv supplemental file containing all gene counts as a single matrix
intergenic_counts.tsv supplemental file containing all intergenic counts as a single matrix

Submission date

Jul 17, 2018

Last update date

Sep 04, 2018

Contact name

Brian Oliver

E-mail(s)

briano@nih.gov

Phone

301-204-9463

Organization name

NIDDK, NIH

Department

LBG

Lab

Developmental Genomics

Street address

50 South Drive

City

Bethesda

State/province

ZIP/Postal code

20892

Country

USA

Platform ID

GPL11203

Series (1)

GSE117217

Remapping the SRA: Drosophila melanogaster RNA-Seq data from the Sequence Read Archive

Relations

BioSample

SAMEA1573925

SRA

ERX173584

Named Annotation

GSM3273982_ERX173584.flybase.plus.bw

Named Annotation

GSM3273982_ERX173584.flybase.minus.bw

Supplementary file	Size	Download	File type/resource
GSM3273982_ERX173584.bam.counts.jcounts.txt.gz	21.7 Kb	(ftp)(http)	TXT
GSM3273982_ERX173584.bam.counts.txt.gz	829.8 Kb	(ftp)(http)	TXT
GSM3273982_ERX173584.bam.intergenic.counts.jcounts.txt.gz	16.8 Kb	(ftp)(http)	TXT
GSM3273982_ERX173584.bam.intergenic.counts.txt.gz	146.3 Kb	(ftp)(http)	TXT
GSM3273982_ERX173584.flybase.minus.bw	2.7 Mb	(ftp)(http)	BW
GSM3273982_ERX173584.flybase.plus.bw	2.9 Mb	(ftp)(http)	BW
SRA Run Selector
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record