NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM980644 Query DataSets for GSM980644
Status Public on Feb 21, 2014
Title GM12004_GROseq
Sample type SRA
 
Source name B-cells
Organism Homo sapiens
Characteristics cell type: B-cells
individual: GM12004
assay: global run-on
Biomaterial provider Coriell; http://ccr.coriell.org/Sections/Search/Search.aspx?PgId=165&q=GM12004
Treatment protocol Nuclei were isolated from B cells. 4 X 107 B cells were collected by centrifugation at 400 X g for 2 min at 4oC. The cells were washed with 20 mL of ice-cold PBS and resuspended in 10 mL of ice-cold lysis buffer [20 mM Tris-HCl pH 7.4, 150 mM KCl, 1.5 mM MgCl2, 1 mM DTT, 0.5% Igepal CA-630, 1X Complete Protease Inhibitor Cocktail (Roche, USA) and 4 units/mL RNAseOUT (Invitrogen, USA)]. Cell suspension was incubated on ice for 10 minutes before nuclei were centrifuged by 500 X g for 1 min at 4oC. Pellets containing nuclei were washed carefully with 10 mL ice-cold lysis buffer, collected by centrifugation (500 X g, 1 min, 4oC), and resuspended in ice-cold storage buffer [50 mM Tris-HCl pH 8.3, 5 mM MgCl2, 0.1 mM EDTA, 40% glycerol] to 5X106 nuclei/100 uL. Nuclei were then snap frozen in liquid nitrogen until GRO-seq experiments. GRO-seq and PRO-seq experiments were carried out as previously described in Core et al, Science. 2008 Dec 19;322(5909):1845-8.
Growth protocol Cultured B-cell lines from two CEPH individuals (Centre d'Etude du Polymorphisme Humain), GM12004 and GM12750, were obtained from Coriell Cell Repositories (Camden, NJ, USA). The B-cells were grown to a density of 5x105 cells/mL in RPMI 1640 supplemented with 15% fetal bovine serum, 100 units/mL penicillin and 100 μg/mL streptomycin, and 2 mmol/L L-glutamine. For downstream experiments, cells were harvested 24 hours after addition of fresh medium.
Extracted molecule total RNA
Extraction protocol Libraries were prepared following Illumina Directional mRNA-seq sample preparation protocol.
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina HiSeq 2000
 
Data processing Sequence analysis-The GRO-seq and PRO-seq samples were sequenced using HiSeq 2000 instrument and 100-200 million 100-nt reads per sample were generated. Low-quality bases as designated by Illumina were trimmed from the 3’ end of reads, and reads shorter than 35bp were removed. The resulting reads were aligned to an index comprising the human reference genome (hg18) and the Epstein-Barr virus genome (NC_009334.1) using GSNAP (version 2012-04-10). A list of SNP sites in the CEU population from Hapmap (release #28) and 1000 Genomes (pilot project) was used to allow for SNP-tolerant alignments. The following parameters were used: Mismatches ≤[(read length+2)/12-2]; Mapping score ≥20; Soft-clipping on (-trim-mismatch-score=-3); Known exon-exon junctions (defined by RefSeq (downloaded March 7, 2011) and Gencode (version 3c)) and novel junctions (defined by GSNAP) were accepted. SNP sites in the CEU population from Hapmap (release #28) and 1000 Genomes (pilot project) were included for SNP-tolerant alignments. Only reads that aligned to one genomic location (uniquely mapped reads) were used in further analyses.
RNA-DNA differences-To identify RDDs, we compared RNA sequence to its corresponding DNA sequence. Low-quality bases (Phred quality score < 20) in both the RNA and DNA were removed from consideration. To be included as RDD sites in the final lists, the following criteria have to be met: 1) a minimum of 10 total RNA-seq reads covering that site; 2) a minimum of 10 total DNA-seq reads covering that site; 3) DNA sequence at this site is 100% concordant, without any DNA-seq reads containing alternative alleles; 4) level of RDD (# of RNA-seq reads containing non-DNA allele/# all RNA-seq reads covering a given site) is ≥10% (a minimum of two RNA-seq reads containing RDD). To ensure the accuracy of the RDD sites, additional filtering steps were performed using two additional mapping algorithms. First, we removed all the sites that reside in repetitive genome regions annotated by repeat masker (RepeatMasker version 3.2.7). Second, local sequences around each RDD site were aligned to the human reference genome to rule out misalignments to paralogous sequences or remaining pseudogenes. Specifically, for each RDD event, genomic sequences comprising sequences of length 25 bp, 50 bp, and 75 bp upstream and downstream of each site along with either the DNA variant or RNA variant were aligned to an index containing human reference genome (hg18) and sequences in hg19 but not present in hg18 using BLAT(5) (Stand-alone, v. 34x11). The settings '-stepSize=5' and 'repMatch=2253' were used to increase sensitivity. RDD events were removed if any of the 6 corresponding sequences aligned to another genomic location with ≤n mismatches (n=(read length + 2)/12–2) and with sequences that explain the RDD call (that is if the genomic sequences match the RNA sequence). Lastly, to avoid potential misalignment of spliced reads in GSNAP due to its high gap penalty algorithm, we re-aligned all the RNA-seq reads that contain putative RDD alleles using BLAT (Stand-alone, v. 34x11). Human genome sequences in hg19 that are not present in hg18 were included in our index in addition to sequences in hg18. Here, a low gap penalty was applied during BLAT alignment in order to compensate for high gap penalty of GSNAP alignment of spliced reads. Only RDD sites that are supported by both GSNAP and BLAT are retained for downstream analysis.
Genome_build: HG18
Supplementary_files_format_and_content: Text files that contain 1) sites and types of RNA-DNA differences; 2) FPKM values
 
Submission date Aug 03, 2012
Last update date May 15, 2019
Contact name Isabel Xiaorong Wang
Organization name HHMI/University of Michigan
Department Pediatrics&Genetics
Lab Dr. Vivian G. Cheung Lab
Street address 210 washtenaw ave
City Ann Arbor
State/province Michigan
ZIP/Postal code 48109
Country USA
 
Platform ID GPL11154
Series (1)
GSE39878 RNA-DNA DIFFERENCES IN NASCENT RNA
Relations
Reanalyzed by GSE67540
Reanalyzed by GSE85747
SRA SRX173216
BioSample SAMN01096957

Supplementary file Size Download File type/resource
GSM980644_GM12004GRO-2_fastxclip-fix_trimlowqual.defv20120410.unique.bam 5.0 Gb (ftp)(http) BAM
GSM980644_TableS1_GROseqRDD_GM12004.txt.gz 33.7 Kb (ftp)(http) TXT
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap