|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Jan 27, 2015 |
Title |
Alternatively processed and compiled RNA-Sequencing and clinical data for thousands of samples from The Cancer Genome Atlas |
Organism |
Homo sapiens |
Experiment type |
Expression profiling by high throughput sequencing
|
Summary |
We reprocessed RNA-Seq data for 9264 tumor samples and 741 normal samples across 24 cancer types from The Cancer Genome Atlas with "Rsubread". Rsubread is an open source R package that has shown high concordance with other existing methods of alignment and summarization, but is simple to use and takes significantly less time to process data. Additionally, we provide clinical variables publicly available as of May 20, 2015 for the tumor samples where the TCGA ids are matched.
|
|
|
Overall design |
Using Rsubread version 1.14.2 R package, we aligned the fastq files downloaded from Cghub. First, the reads were aligned with align() function to the UCSC hg19 reference genome. Second, featureCounts() function was used to summarize the gene level expression values as integer number. Third, these summarized gene values were normalized to FPKM and TPM values. In addition, we downloaded clinical data for all tumor samples. We collated this data into a matrix and matched the ids reported in RNA-Seq data. A supplemental file showing the cancer type of each sample is included for the users. Please note that [1] All 9264 tumor samples have been combined to create the processed matrix files for tumor samples [2] All 741 normal samples have been combined to create the processed matrix files for normal samples [3] The TCGA_24_CancerType_Samples.txt and TCGA_24_Normal_CancerType_Samples.txt files list each sample tumor type for tumor samples and normal samples respectively. [4] 548 clinical variables for each sample are provided in the TCGA_24_548_Clinical_Variables_9264_Samples.txt [5] Raw data mRNA sequence can be downloaded from CGHub (https://cghub.ucsc.edu/) with an access key and processed with pipeline available from https://github.com/srp33/TCGA_RNASeq_Clinical.
|
|
|
Contributor(s) |
Rahman M, Piccolo SR |
Citation(s) |
26209429 |
|
Submission date |
Nov 03, 2014 |
Last update date |
Feb 21, 2019 |
Contact name |
Mumtahena Rahman |
E-mail(s) |
moom.rahman@utah.edu
|
Organization name |
University of Utah
|
Department |
Biomedical Informatics
|
Lab |
Andrea Bild's Lab
|
Street address |
20 South 2030 East, Rm 420
|
City |
Salt Lake City |
State/province |
UT |
ZIP/Postal code |
84121 |
Country |
USA |
|
|
Platforms (1) |
GPL9052 |
Illumina Genome Analyzer (Homo sapiens) |
|
Samples (2) |
|
Relations |
BioProject |
PRJNA266377 |
Supplementary file |
Size |
Download |
File type/resource |
GSE62944_01_27_15_TCGA_20_420_Clinical_Variables_7706_Samples.txt.gz |
517.5 Kb |
(ftp)(http) |
TXT |
GSE62944_01_27_15_TCGA_20_CancerType_Samples.txt.gz |
55.8 Kb |
(ftp)(http) |
TXT |
GSE62944_06_01_15_TCGA_24_548_Clinical_Variables_9264_Samples.txt.gz |
1.2 Mb |
(ftp)(http) |
TXT |
GSE62944_06_01_15_TCGA_24_CancerType_Samples.txt.gz |
71.6 Kb |
(ftp)(http) |
TXT |
GSE62944_06_01_15_TCGA_24_Normal_CancerType_Samples.txt.gz |
5.5 Kb |
(ftp)(http) |
TXT |
GSE62944_RAW.tar |
5.9 Gb |
(http)(custom) |
TAR (of TXT) |
Raw data not provided for this record |
Processed data provided as supplementary file |
|
|
|
|
|