 |
 |
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Jun 23, 2016 |
Title |
Mouse brain tissue_rep2 |
Sample type |
SRA |
|
|
Source name |
Brain tissue_technical replicate
|
Organism |
Mus musculus |
Characteristics |
strain: C57BL/6J tissue: brain gender: female
|
Extracted molecule |
total RNA |
Extraction protocol |
Adult female was sacrificed by cervical dislocation and the whole brain was immediately collected, rinsed with ice-cold PBS three times and snap frozen. Frozen whole mouse brain tissue was ground into fine powder in liquid nitrogen using a mortar and pestle. The tissue powder was quickly transferred into a Petri dish on a bed of dry ice and irradiated on dry ice three times at 400 mJ/cm2 in a UV crosslinker (254 nm) with gentle swirling between each irradiation. Crosslinked powdered tissue was immediately lysed and subjected to MARIO procedure as described. Protein-RNA complex was eluted from streptavidin beads and RNA is recovered by digesting the bound protein. Eluted RNA was subjected to rigorous DNase treatment to eliminate DNA contamination. Purified RNAs were hybridized with a DNA probe that is complementary to the biotin-tagged RNA linker, and treated with T7 exonuclease to remove the non-ligated biotinylated RNA linkers. As a result, mainly the successfully ligated chimeric RNAs retained a biotin-tagged linker at the junction. This chimeric RNA library was fragmented again to an average of 150 nucleotides, and the ligation junctions were pulled-down with streptavidin-coated magnetic beads. The end product was a library of ~150nt chimeric RNAs. This library was enriched with chimeras of in the form of 5'-R1-linker-R2, where R1 and R2 were fragments of interacting RNAs. This library was converted into cDNAs and sequenced with paired-end next-generation sequencing.
|
|
|
Library strategy |
OTHER |
Library source |
transcriptomic |
Library selection |
other |
Instrument model |
Illumina HiSeq 2500 |
|
|
Description |
Sample name: Brain_2 crosslinked RNA
|
Data processing |
Library strategy: MARIO
MARIO-tools is a package of command-line tools for analyses of MARIO data. It is written in Python and R and is version controlled by GitHub. The full documentation is at http://mariotools.ucsd.edu/. The pipeline takes pair-end sequencing reads as input. The oligonucleotide sequences of the RNA linker and the sample barcodes used for multiplexed sequencing should also be provided to the pipeline. Script files mentioned below are all components from the pipeline.
Removing PCR duplicates. The forward read contains a 4nt sample barcode and a 6nt random barcode at the 5' end. A read pair was classified as a PCR duplicate of another read pair and is therefore discarded if the two read pairs had identical sequences and contained identical barcodes (10nt). The tool ‘remove_dup_PE.py’ provides this function, and generates a fastq/fasta file containing the non-duplicated reads, and reports the number of duplicates removed.
Assigning multiplexed sequencing reads into corresponding experimental samples. The tool ‘split_library_pairend.py’ assigns each pair-end read into a sample by matching the sample barcode in each read with those in the list of sample barcodes (a user input text file), generates a fastq/fasta file for the reads assigned to each sample, as well as a fastq/fasta file for the unassigned reads.
Recovering the cDNAs in the sequencing library. This step identifies the overlapping regions of the two ends of every read pair, if any. It also recovers the entire sequences of the cDNAs in the sequencing library, whenever possible. This function is achieved by ‘recoverFragment.py’, which uses local alignment to identify the overlapping regions. When the overlap was small (15bp or less) compared to read length (100bp on each end), local alignment could be insensitive. To overcome this insensitivity, ‘recoverFragment.py’ collects the read pairs without identifiable overlaps after the first alignment, truncates each read into one third of its length (retaining 33bp at the 3’ of each read), and repeats local alignment.
Parsing the chimeric cDNAs. This step categorizes the cDNAs based on their configurations. This takes the completely and partially recovered cDNA sequences, as well as the linker sequence as inputs. It identifies the location of the linker in the cDNA, and generates five categories of cDNAs based the locations of the linker sequence.
Mapping to the genome. Hereafter, all analyses were based on the RNA1-Linker-RNA2 type of read pairs. First, any cDNA containing less than 15bp on either the RNA1 or RNA2 side of linker was discarded, because it is unlikely to uniquely map a 15bp or less sequence to the genome in the mapping step. Then the two Mapping to the genome. RNA fragments on each side of the linker (RNA1 and RNA2) were separately mapped to the mouse genome mm9/NCBI37 using Tophat version 2.1.1. This step, implemented in ‘Stitch-seq_Aligner.py’ outputs the read pairs where both RNA1 and RNA2 were uniquely mapped to the genome. The aligned tab-delimited text files are results from this step.
The annotations were retrieved from Ensembl (release 67, mouse NCBIM37), including the genes of mRNAs, lincRNAs, rRNAs, snRNAs, snoRNAs, miRNAs, misc_RNAs, tRNAs, and transposons. The different genomic copies of the same transposon were considered as different genes in this analysis. The reads mapped to rRNAs were removed from further analysis. The number of uniquely aligned reads (from either RNA1 or RNA2 of the RNA1-Linker-RNA2 type) were counted on every gene. Any gene with a read count less than 5 was filtered out. Next, the association between any two genes was tested with Fisher’s exact test. The null hypothesis was that gene A and gene B independently contributed to the sequencing reads. The alternative hypothesis was that their contributions to read counts were associated. We denoted c_A, c_B as the read counts for gene A and gene B, respectively, and I_{A,B} as the read counts of co-appearance, where the two genes co-appeared on the same read pair. A Fisher’s exact test was carried out on each gene pair, with c_A, c_B, I_{A,B}, and the read counts on other genes besides gene A (gene B) as the test statistics. Both p-values and FDRs (Benjamini-Hochberg procedure) were calculated for every gene pair. This step outputs gene pairs with FDR < 0.05 and fold change (FC) >= 3. The FC was calculated as (I_{A,B} + 0.5) / (I'_{A,B} + 0.5), where I'_{A,B} was the co-appearing read counts in the control sample (ES-indirect). This step was implemented in ‘Select_strongInteraction_RNA.py’ which outputs strong interacting RNA pairs with information of their interaction regions, number of supporting pairs, p-value of significance, FDR and fold changes. The strong interaction clusters tab-delimited text files are from this step.
For further analysis steps to identify RNA interaction sites, calculating binding energies between the interaction sites, network analysis, intra-molecule cutting/ligation detection and RNA secondary structure prediction, please refer to corresponding manuscript.
Genome_build: mm9
Supplementary_files_format_and_content: The *fragment_paired_align.txt.gz file column headers: Chromosome_RNA_1, Start_RNA_1, End_RNA_1, Strand_RNA_1, Fragment_in_read_RNA_1, Reference_RNA_1, Type_RNA_1, Name_RNA_1, Sub_type_RNA_1, Strand_Properness_RNA_1, Read_ID, Chromosome_RNA_2, Start_RNA_2, END_RNA_2, Strand_RNA_2, Fragment_in_read_RNA_2, Reference_RNA_2, Type_RNA_2, Name_RNA_2, Sub_type_RNA_2, Strand_Properness_RNA_2.
|
|
|
Submission date |
May 12, 2016 |
Last update date |
May 15, 2019 |
Contact name |
Sheng Zhong |
E-mail(s) |
szhong@eng.ucsd.edu
|
Organization name |
University of California, San Diego
|
Department |
Bioengineering Department
|
Street address |
9500 Gilman Drive, Mail Code 0412
|
City |
La Jolla |
State/province |
California |
ZIP/Postal code |
92093-0412 |
Country |
USA |
|
|
Platform ID |
GPL17021 |
Series (2) |
GSE61489 |
Mapping RNA-RNA interactome and RNA structures in vivo with MARIO |
GSE81388 |
Mapping RNA-RNA interactome and RNA structures in vivo |
|
Relations |
BioSample |
SAMN04994044 |
SRA |
SRX1758456 |
Supplementary file |
Size |
Download |
File type/resource |
GSM2151619_Brain_CCGG_Rep2_fragment_paired_align.txt.gz |
90.5 Mb |
(ftp)(http) |
TXT |
SRA Run Selector |
Raw data are available in SRA |
Processed data provided as supplementary file |
|
|
|
|
 |