|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Feb 21, 2014 |
Title |
GM12004_GROseq |
Sample type |
SRA |
|
|
Source name |
B-cells
|
Organism |
Homo sapiens |
Characteristics |
cell type: B-cells individual: GM12004 assay: global run-on
|
Biomaterial provider |
Coriell; http://ccr.coriell.org/Sections/Search/Search.aspx?PgId=165&q=GM12004
|
Treatment protocol |
Nuclei were isolated from B cells. 4 X 107 B cells were collected by centrifugation at 400 X g for 2 min at 4oC. The cells were washed with 20 mL of ice-cold PBS and resuspended in 10 mL of ice-cold lysis buffer [20 mM Tris-HCl pH 7.4, 150 mM KCl, 1.5 mM MgCl2, 1 mM DTT, 0.5% Igepal CA-630, 1X Complete Protease Inhibitor Cocktail (Roche, USA) and 4 units/mL RNAseOUT (Invitrogen, USA)]. Cell suspension was incubated on ice for 10 minutes before nuclei were centrifuged by 500 X g for 1 min at 4oC. Pellets containing nuclei were washed carefully with 10 mL ice-cold lysis buffer, collected by centrifugation (500 X g, 1 min, 4oC), and resuspended in ice-cold storage buffer [50 mM Tris-HCl pH 8.3, 5 mM MgCl2, 0.1 mM EDTA, 40% glycerol] to 5X106 nuclei/100 uL. Nuclei were then snap frozen in liquid nitrogen until GRO-seq experiments. GRO-seq and PRO-seq experiments were carried out as previously described in Core et al, Science. 2008 Dec 19;322(5909):1845-8.
|
Growth protocol |
Cultured B-cell lines from two CEPH individuals (Centre d'Etude du Polymorphisme Humain), GM12004 and GM12750, were obtained from Coriell Cell Repositories (Camden, NJ, USA). The B-cells were grown to a density of 5x105 cells/mL in RPMI 1640 supplemented with 15% fetal bovine serum, 100 units/mL penicillin and 100 μg/mL streptomycin, and 2 mmol/L L-glutamine. For downstream experiments, cells were harvested 24 hours after addition of fresh medium.
|
Extracted molecule |
total RNA |
Extraction protocol |
Libraries were prepared following Illumina Directional mRNA-seq sample preparation protocol.
|
|
|
Library strategy |
RNA-Seq |
Library source |
transcriptomic |
Library selection |
cDNA |
Instrument model |
Illumina HiSeq 2000 |
|
|
Data processing |
Sequence analysis-The GRO-seq and PRO-seq samples were sequenced using HiSeq 2000 instrument and 100-200 million 100-nt reads per sample were generated. Low-quality bases as designated by Illumina were trimmed from the 3’ end of reads, and reads shorter than 35bp were removed. The resulting reads were aligned to an index comprising the human reference genome (hg18) and the Epstein-Barr virus genome (NC_009334.1) using GSNAP (version 2012-04-10). A list of SNP sites in the CEU population from Hapmap (release #28) and 1000 Genomes (pilot project) was used to allow for SNP-tolerant alignments. The following parameters were used: Mismatches ≤[(read length+2)/12-2]; Mapping score ≥20; Soft-clipping on (-trim-mismatch-score=-3); Known exon-exon junctions (defined by RefSeq (downloaded March 7, 2011) and Gencode (version 3c)) and novel junctions (defined by GSNAP) were accepted. SNP sites in the CEU population from Hapmap (release #28) and 1000 Genomes (pilot project) were included for SNP-tolerant alignments. Only reads that aligned to one genomic location (uniquely mapped reads) were used in further analyses. RNA-DNA differences-To identify RDDs, we compared RNA sequence to its corresponding DNA sequence. Low-quality bases (Phred quality score < 20) in both the RNA and DNA were removed from consideration. To be included as RDD sites in the final lists, the following criteria have to be met: 1) a minimum of 10 total RNA-seq reads covering that site; 2) a minimum of 10 total DNA-seq reads covering that site; 3) DNA sequence at this site is 100% concordant, without any DNA-seq reads containing alternative alleles; 4) level of RDD (# of RNA-seq reads containing non-DNA allele/# all RNA-seq reads covering a given site) is ≥10% (a minimum of two RNA-seq reads containing RDD). To ensure the accuracy of the RDD sites, additional filtering steps were performed using two additional mapping algorithms. First, we removed all the sites that reside in repetitive genome regions annotated by repeat masker (RepeatMasker version 3.2.7). Second, local sequences around each RDD site were aligned to the human reference genome to rule out misalignments to paralogous sequences or remaining pseudogenes. Specifically, for each RDD event, genomic sequences comprising sequences of length 25 bp, 50 bp, and 75 bp upstream and downstream of each site along with either the DNA variant or RNA variant were aligned to an index containing human reference genome (hg18) and sequences in hg19 but not present in hg18 using BLAT(5) (Stand-alone, v. 34x11). The settings '-stepSize=5' and 'repMatch=2253' were used to increase sensitivity. RDD events were removed if any of the 6 corresponding sequences aligned to another genomic location with ≤n mismatches (n=(read length + 2)/12–2) and with sequences that explain the RDD call (that is if the genomic sequences match the RNA sequence). Lastly, to avoid potential misalignment of spliced reads in GSNAP due to its high gap penalty algorithm, we re-aligned all the RNA-seq reads that contain putative RDD alleles using BLAT (Stand-alone, v. 34x11). Human genome sequences in hg19 that are not present in hg18 were included in our index in addition to sequences in hg18. Here, a low gap penalty was applied during BLAT alignment in order to compensate for high gap penalty of GSNAP alignment of spliced reads. Only RDD sites that are supported by both GSNAP and BLAT are retained for downstream analysis. Genome_build: HG18 Supplementary_files_format_and_content: Text files that contain 1) sites and types of RNA-DNA differences; 2) FPKM values
|
|
|
Submission date |
Aug 03, 2012 |
Last update date |
May 15, 2019 |
Contact name |
Isabel Xiaorong Wang |
Organization name |
HHMI/University of Michigan
|
Department |
Pediatrics&Genetics
|
Lab |
Dr. Vivian G. Cheung Lab
|
Street address |
210 washtenaw ave
|
City |
Ann Arbor |
State/province |
Michigan |
ZIP/Postal code |
48109 |
Country |
USA |
|
|
Platform ID |
GPL11154 |
Series (1) |
GSE39878 |
RNA-DNA DIFFERENCES IN NASCENT RNA |
|
Relations |
Reanalyzed by |
GSE67540 |
Reanalyzed by |
GSE85747 |
SRA |
SRX173216 |
BioSample |
SAMN01096957 |
Supplementary file |
Size |
Download |
File type/resource |
GSM980644_GM12004GRO-2_fastxclip-fix_trimlowqual.defv20120410.unique.bam |
5.0 Gb |
(ftp)(http) |
BAM |
GSM980644_TableS1_GROseqRDD_GM12004.txt.gz |
33.7 Kb |
(ftp)(http) |
TXT |
SRA Run Selector |
Raw data are available in SRA |
Processed data are available on Series record |
Processed data provided as supplementary file |
|
|
|
|
|