GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM980644

Query DataSets for GSM980644

Status

Public on Feb 21, 2014

Title

GM12004_GROseq

Sample type

SRA

Source name

B-cells

Organism

Homo sapiens

Characteristics

cell type: B-cells
individual: GM12004
assay: global run-on

Biomaterial provider

Coriell; http://ccr.coriell.org/Sections/Search/Search.aspx?PgId=165&q=GM12004

Treatment protocol

Nuclei were isolated from B cells. 4 X 107 B cells were collected by centrifugation at 400 X g for 2 min at 4oC. The cells were washed with 20 mL of ice-cold PBS and resuspended in 10 mL of ice-cold lysis buffer [20 mM Tris-HCl pH 7.4, 150 mM KCl, 1.5 mM MgCl2, 1 mM DTT, 0.5% Igepal CA-630, 1X Complete Protease Inhibitor Cocktail (Roche, USA) and 4 units/mL RNAseOUT (Invitrogen, USA)]. Cell suspension was incubated on ice for 10 minutes before nuclei were centrifuged by 500 X g for 1 min at 4oC. Pellets containing nuclei were washed carefully with 10 mL ice-cold lysis buffer, collected by centrifugation (500 X g, 1 min, 4oC), and resuspended in ice-cold storage buffer [50 mM Tris-HCl pH 8.3, 5 mM MgCl2, 0.1 mM EDTA, 40% glycerol] to 5X106 nuclei/100 uL. Nuclei were then snap frozen in liquid nitrogen until GRO-seq experiments. GRO-seq and PRO-seq experiments were carried out as previously described in Core et al, Science. 2008 Dec 19;322(5909):1845-8.

Growth protocol

Cultured B-cell lines from two CEPH individuals (Centre d'Etude du Polymorphisme Humain), GM12004 and GM12750, were obtained from Coriell Cell Repositories (Camden, NJ, USA). The B-cells were grown to a density of 5x105 cells/mL in RPMI 1640 supplemented with 15% fetal bovine serum, 100 units/mL penicillin and 100 μg/mL streptomycin, and 2 mmol/L L-glutamine. For downstream experiments, cells were harvested 24 hours after addition of fresh medium.

Extracted molecule

total RNA

Extraction protocol

Libraries were prepared following Illumina Directional mRNA-seq sample preparation protocol.

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

Illumina HiSeq 2000

Data processing

Sequence analysis-The GRO-seq and PRO-seq samples were sequenced using HiSeq 2000 instrument and 100-200 million 100-nt reads per sample were generated. Low-quality bases as designated by Illumina were trimmed from the 3’ end of reads, and reads shorter than 35bp were removed. The resulting reads were aligned to an index comprising the human reference genome (hg18) and the Epstein-Barr virus genome (NC_009334.1) using GSNAP (version 2012-04-10). A list of SNP sites in the CEU population from Hapmap (release #28) and 1000 Genomes (pilot project) was used to allow for SNP-tolerant alignments. The following parameters were used: Mismatches ≤[(read length+2)/12-2]; Mapping score ≥20; Soft-clipping on (-trim-mismatch-score=-3); Known exon-exon junctions (defined by RefSeq (downloaded March 7, 2011) and Gencode (version 3c)) and novel junctions (defined by GSNAP) were accepted. SNP sites in the CEU population from Hapmap (release #28) and 1000 Genomes (pilot project) were included for SNP-tolerant alignments. Only reads that aligned to one genomic location (uniquely mapped reads) were used in further analyses.
RNA-DNA differences-To identify RDDs, we compared RNA sequence to its corresponding DNA sequence. Low-quality bases (Phred quality score < 20) in both the RNA and DNA were removed from consideration. To be included as RDD sites in the final lists, the following criteria have to be met: 1) a minimum of 10 total RNA-seq reads covering that site; 2) a minimum of 10 total DNA-seq reads covering that site; 3) DNA sequence at this site is 100% concordant, without any DNA-seq reads containing alternative alleles; 4) level of RDD (# of RNA-seq reads containing non-DNA allele/# all RNA-seq reads covering a given site) is ≥10% (a minimum of two RNA-seq reads containing RDD). To ensure the accuracy of the RDD sites, additional filtering steps were performed using two additional mapping algorithms. First, we removed all the sites that reside in repetitive genome regions annotated by repeat masker (RepeatMasker version 3.2.7). Second, local sequences around each RDD site were aligned to the human reference genome to rule out misalignments to paralogous sequences or remaining pseudogenes. Specifically, for each RDD event, genomic sequences comprising sequences of length 25 bp, 50 bp, and 75 bp upstream and downstream of each site along with either the DNA variant or RNA variant were aligned to an index containing human reference genome (hg18) and sequences in hg19 but not present in hg18 using BLAT(5) (Stand-alone, v. 34x11). The settings '-stepSize=5' and 'repMatch=2253' were used to increase sensitivity. RDD events were removed if any of the 6 corresponding sequences aligned to another genomic location with ≤n mismatches (n=(read length + 2)/12–2) and with sequences that explain the RDD call (that is if the genomic sequences match the RNA sequence). Lastly, to avoid potential misalignment of spliced reads in GSNAP due to its high gap penalty algorithm, we re-aligned all the RNA-seq reads that contain putative RDD alleles using BLAT (Stand-alone, v. 34x11). Human genome sequences in hg19 that are not present in hg18 were included in our index in addition to sequences in hg18. Here, a low gap penalty was applied during BLAT alignment in order to compensate for high gap penalty of GSNAP alignment of spliced reads. Only RDD sites that are supported by both GSNAP and BLAT are retained for downstream analysis.
Genome_build: HG18
Supplementary_files_format_and_content: Text files that contain 1) sites and types of RNA-DNA differences; 2) FPKM values

Submission date

Aug 03, 2012

Last update date

May 15, 2019

Contact name

Isabel Xiaorong Wang

Organization name

HHMI/University of Michigan

Department

Pediatrics&Genetics

Lab

Dr. Vivian G. Cheung Lab

Street address

210 washtenaw ave

City

Ann Arbor

State/province

Michigan

ZIP/Postal code

48109

Country

USA

Platform ID

GPL11154

Series (1)

GSE39878

RNA-DNA DIFFERENCES IN NASCENT RNA

Relations

Reanalyzed by

Reanalyzed by

SRA

BioSample

Supplementary file	Size	Download	File type/resource
GSM980644_GM12004GRO-2_fastxclip-fix_trimlowqual.defv20120410.unique.bam	5.0 Gb	(ftp)(http)	BAM
GSM980644_TableS1_GROseqRDD_GM12004.txt.gz	33.7 Kb	(ftp)(http)	TXT
SRA Run Selector
Raw data are available in SRA
Processed data are available on Series record
Processed data provided as supplementary file