GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
Series GSE47774 Query DataSets for GSE47774
Status Public on Aug 08, 2014
Title A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequence Quality Control consortium
Organisms Homo sapiens; synthetic construct
Experiment type Expression profiling by high throughput sequencing
Summary We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for sequence discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcriptlevel profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings.
Overall design The well-characterized reference RNA samples A (pooled cell lines) and B (human brain) from the MAQC consortium, adding spike-ins of synthetic RNA from the External RNA Control Consortium (ERCC). Samples C and D were then constructed by combining A and B in known mixing ratios, 3:1 and 1:3, respectively. All samples were distributed to several independent sites for RNA-Seq library construction and profiling by Illumina HiSeq 2000 and LifeTech SOLiD 5500 platforms. Also, vendors created their own cDNA libraries that were then distributed to each test site, in order to examine the degree of a ?site effect? that was independent of the library preparation process. To support an assessment of gene models, samples A and B were also sequenced at independent sites by the Roche 454 platform, providing longer reads. For comparison to other technologies, these data were also compared to the MAQC-I Affymetrix U133 Plus2 microarray, several current microarray platforms, and also assessed by 20,801 PrimePCR reactions.

Sample A: Universal Human Reference RNA (UHRR) from Stratagene and ERCC Spike-In controls
Sample B: Human Brain Reference RNA (HBRR) from Ambion and ERCC Spike-In controls
Sample C: Mix of A and B (3:1)
Sample D: Mix of A and B (1:3)
Sample E: Ambion ERCC Spike-In Control Mix 1
Sample F: Ambion ERCC Spike-In Control Mix 2
Contributor(s) Su Z, Labaj PP, Li S, Thierry-Mieg J, Thierry-Mieg D, Shi W, Wang C, Schroth GP, Jones WD, Xiao W, Xu W, Jensen RV, Kelly R, Xu J, Conesa A, Gao H, Jafari N, Lu F, Oakeley EJ, Praul CA, Santoyo-Lopez J, Tan X, Thompson EA, Vandesompele J, Peng Z, Scherer A, Zavadil J, Hong H, Liao Y, Setterquist RA, Amur S, Auerbach SS, Bannon DI, Bao W, Binder H, Blomquist T, Boysen C, Bramlett K, Brilliant MH, Bushel PR, Cai W, Catalano JG, Chang C, Chaudhuri R, Chen G, Chen T, Chierici M, Chu T, Clevert D
Citation(s) 25150838, 25633159
Submission date Jun 10, 2013
Last update date May 15, 2019
Contact name Leming Shi
Phone +86-18616827008
Organization name Fudan University
Department School of Life Sciences
Lab Center for Pharmacogenomics
Street address 2005 Songhu Road
City Shanghai
ZIP/Postal code 200438
Country China
Platforms (5)
GPL11154 Illumina HiSeq 2000 (Homo sapiens)
GPL14603 454 GS FLX Titanium (Homo sapiens)
GPL15228 Illumina HiSeq 2000 (synthetic construct)
Samples (3396)
This SubSeries is part of SuperSeries:
GSE47792 SEQC Project
SRA SRP025982
BioProject PRJNA208369

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE47774_SEQC_ILM_AGR.txt.gz 13.5 Mb (ftp)(http) TXT
GSE47774_SEQC_ILM_BGI.txt.gz 15.8 Mb (ftp)(http) TXT
GSE47774_SEQC_ILM_CNL.txt.gz 14.8 Mb (ftp)(http) TXT
GSE47774_SEQC_ILM_COH.txt.gz 7.1 Mb (ftp)(http) TXT
GSE47774_SEQC_ILM_MAY.txt.gz 16.0 Mb (ftp)(http) TXT
GSE47774_SEQC_ILM_NVS.txt.gz 14.1 Mb (ftp)(http) TXT
GSE47774_SEQC_ILM_NYG_all_counts.txt.gz 2.8 Mb (ftp)(http) TXT
GSE47774_SEQC_LIF_LIV.txt.gz 2.1 Mb (ftp)(http) TXT
GSE47774_SEQC_LIF_NWU.txt.gz 10.2 Mb (ftp)(http) TXT
GSE47774_SEQC_LIF_PSU.txt.gz 10.4 Mb (ftp)(http) TXT
GSE47774_SEQC_LIF_SQW.txt.gz 9.9 Mb (ftp)(http) TXT
GSE47774_SEQC_ROC_MGP.txt.gz 224.8 Kb (ftp)(http) TXT
GSE47774_SEQC_ROC_NYU.txt.gz 234.0 Kb (ftp)(http) TXT
GSE47774_SEQC_ROC_SQW.txt.gz 223.0 Kb (ftp)(http) TXT
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap