NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM1321804 Query DataSets for GSM1321804
Status Public on Jan 01, 2015
Title s6073_16C
Sample type SRA
 
Source name whole rosette, 9-leaf stage, 16C, 6073
Organism Arabidopsis thaliana
Characteristics accession number: 6073
accession name: ÖMö1-7
growth temperature: 16C
tissue: whole rosette
develomental stage: 9-leaf stage
Treatment protocol The Arabidopsis accessions were reared at either 10C or 16C.
Growth protocol A diverse set of 163 Swedish accessions were sown on soil and stratified for 3 days at 4 C in the dark. They were then transferred to environmentally controlled growth chambers under long day conditions (04:00-20:00 hours corresponding to light) and individual seedlings were transplanted to single pots after one week. When plants attained the 9-true-leaf stage of development, whole rosettes were collected between 15:00 and 16:00 hours into the light cycle and flash frozen in liquid nitrogen.
Extracted molecule total RNA
Extraction protocol For each accession, 3 plants were pooled and total RNA was extracted by TRIzol (Invitrogen 15596-018), DNase treated and mRNA purified with oligo dT Dynabeads (Life Technology).
RNA was fragmented using Ambion Fragmentation buffer and first and second strand cDNA synthesis was carried out using Invitrogen kit 18064-071. The ends of sheared fragments were repaired using Epicentre kit ER81050. After A-tailing using exo- Klenow fragment (New England Biolabs, MA, NEB M0212L), barcoded adaptors were ligated with Epicentre Fast-Link DNA Ligation Kit (Epicentre LK6201H). Adaptor-ligated DNA was resolved on 1.5% low melt agarose gels for 1 hour at 100V. DNA in the range of 200-250 bp was excised from the gel and purified with the Zymoclean Gel DNA recovery kit (Zymo Research). The libraries were amplified by PCR for 15 cycles with Illumina PCR primers 1.1 and 1.2 with Phusion polymerase (NEB F-530L). Single-end 36-40 bp sequencing was performed at the University of Southern California Epigenome Center on an Illumina GAIIx instrument using 4-fold multiplexing. Barcodes were introduced into adaptors such that the first 4 bases of the respective Illumina sequence reads corresponded to one of the adaptor sequences (the custom barcodes were: ACGT, CATT, GTAT, and TGCT).
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina Genome Analyzer IIx
 
Description PolyA RNA
s6073_16C_010511-1_ACGT
Data processing Sequences were assigned to barcodes using custom python scripts (adapted from Gan et al. 2011. Nature, 477:419-23). For a read to be assigned to a corresponding library, the first four bases of the sequence were required to exactly match one of the four barcode sequences. Further, to eliminate mis-assignments resulting from low sequence quality, we further required that two of the first three bases have a quality scores >= 30, and that each of the first three bases have quality scores >= 25. For the T at the 4th base of our barcode sequences (shared among all barcodes and required for cloning during Illumina library construction), we further required a quality score >= 20. The submitted FASTQ files retain the barcodes as part of the sequence read (a consequence of the position of the barcode in our modification of the Illumina library construction method).
For read alignment, RNA-Seq reads from each accession were trimmed of barcode bases (the first 4nt) and mapped with the PALMapper aligner22 against the TAIR10 reference genome while taking a set of known variants into account (these variants were detected from the sequencing of the genomes of the accessions as described by Long et al. 2013. Nature Genetics, 45:884-90; for a description of PALMapper, see Jean et al. 2010. Curr Protoc Bioinformatics Chapter 11:Unit 11.6). To account for reads ambiguously mapping to multiple locations, we used a custom python script to remove all reads that showed at least one additional mapping to the best hit with the same edit distance. Additional hits were only assigned as ambiguous and subjected to removal if they differed by more than 3nt in start and stop coordinates to the best hit.
During alignment we used a feature of PALMapper that takes known variants into account to improve RNA-seq alignments. This was used to reduce allelic biases in read alignments to genomes with intraspecific polymorphisms. We first created a set of variants from two different sources: 1) single nucleotide variants (SNV) and structural variants (SV) from genome sequencing and 2) SNVs and SVs called in an initial alignment round of the RNA-Seq reads to the TAIR10 reference genome with PALMapper (relevant parameters: -M 4 -G 4 -E 6 -I 25000 -NI 1 -S). For both sources of variants we applied stringent filter criteria to reduce false calls: 1) genome variants had to appear in at least 40 strains with a minor allele count of at least 5 strains, 2) RNA-Seq variants had to be confirmed by at least 2 alignments within the same strain and had to have less than a factor of 2-fold non-confirming alignments within the same strain. Variants from both sources were integrated into one file that was used for a second, variant-aware alignment round with PALMapper (relevant parameters: -M 2 -G 0 -E 2 -I 5000 -NI 0 -S). In variant-aware alignment mode, PALMapper builds an implicit representation of the reference genome that reflects all possible variant combinations that exist for a genomic region. The output is automatically projected to the TAIR10 coordinate system.
We used a total of 499 RNA-Seq libraries for RNA-Seq expression quantification. Next, we merged libraries (e.g., technical and biological replicates) based on their ecotype and environment, yielding 323 unique merged RNA-Seq samples for each unique ecotype and environment (160 in 10C, 163 in 16C). We quantified gene expression by counting the number of reads with alignments longer than 24 bp and that mapped to genes on all non-chloroplast and non-mitochondrial chromosomes. Our gene set includes all of the annotated TAIR10 features including genes, transposable element sequence, and newly identified genes reported in Gan et al (2011; Nature, 477:419-23). To obtain a stable quantification, we applied a number of filters to the sequence mappings: 1) we only used those reads which were uniquely mapped into the exonic regions of genes, 2) we required that the reads did not map completely into regions where two genes overlap in order to avoid mixing quantifications of different genes, and 3) we excluded regions spanning structural variants (reads that start in an insertion or deletion and their two neighboring bases), alternative splicing (regions that are not contained in all transcripts of a gene), and repetitive sequences (repetitive based on a 50 bp window) that could all bias gene expression quantification.
After filtering the mapped reads, we generated expression results using 1) raw read counts and 2) RPKM estimates for each of the 323 samples. For RPKM estimates, we followed the low level normalization approach proposed by Anders and Huber (2010. Genome Biol 11:R106), jointly applied to the set of expression estimates across ecotypes and environmental backgrounds. First, we estimated effective library sizes as the median expression estimates across all genes. Based on this, we derived correction factors to adjust individual libraries for differences in size. Library-size adjusted raw counts were then used to obtain standard read counts per million expression estimates for each gene.
Genome_build: TAIR10
Supplementary_files_format_and_content: Each processed, tab-delimited data file contains the TAIR10 gene id (column 1), the raw read counts for the indicated sample (column 2), and the RPKM measures for the sample (column 3). Samples represent RNA-Seq estimates for each unique accession and temperature treatment.
 
Submission date Feb 04, 2014
Last update date May 15, 2019
Contact name Richard M Clark
Organization name University of Utah
Department Department of Biology
Lab Clark Laboratory
Street address 257 So. 1400 East, RM 204 SB
City Salt Lake City
State/province Utah
ZIP/Postal code 84112
Country USA
 
Platform ID GPL11221
Series (1)
GSE54680 DNA methylation variation in Arabidopsis has a genetic basis and appears to be involved in local adaptation
Relations
BioSample SAMN02618113
SRA SRX465254

Supplementary file Size Download File type/resource
GSM1321804_s6073_16C_expression_table.txt.gz 345.9 Kb (ftp)(http) TXT
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap