GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
Series GSE62944 Query DataSets for GSE62944
Status Public on Jan 27, 2015
Title Alternatively processed and compiled RNA-Sequencing and clinical data for thousands of samples from The Cancer Genome Atlas
Organism Homo sapiens
Experiment type Expression profiling by high throughput sequencing
Summary We reprocessed RNA-Seq data for 9264 tumor samples and 741 normal samples across 24 cancer types from The Cancer Genome Atlas with "Rsubread". Rsubread is an open source R package that has shown high concordance with other existing methods of alignment and summarization, but is simple to use and takes significantly less time to process data. Additionally, we provide clinical variables publicly available as of May 20, 2015 for the tumor samples where the TCGA ids are matched.
Overall design Using Rsubread version 1.14.2 R package, we aligned the fastq files downloaded from Cghub. First, the reads were aligned with align() function to the UCSC hg19 reference genome. Second, featureCounts() function was used to summarize the gene level expression values as integer number. Third, these summarized gene values were normalized to FPKM and TPM values. In addition, we downloaded clinical data for all tumor samples. We collated this data into a matrix and matched the ids reported in RNA-Seq data. A supplemental file showing the cancer type of each sample is included for the users.
Please note that
[1] All 9264 tumor samples have been combined to create the processed matrix files for tumor samples
[2] All 741 normal samples have been combined to create the processed matrix files for normal samples
[3] The TCGA_24_CancerType_Samples.txt and TCGA_24_Normal_CancerType_Samples.txt files list each sample tumor type for tumor samples and normal samples respectively.
[4] 548 clinical variables for each sample are provided in the TCGA_24_548_Clinical_Variables_9264_Samples.txt
[5] Raw data mRNA sequence can be downloaded from CGHub ( with an access key and processed with pipeline available from
Contributor(s) Rahman M, Piccolo SR
Citation(s) 26209429
Submission date Nov 03, 2014
Last update date Feb 21, 2019
Contact name Mumtahena Rahman
Organization name University of Utah
Department Biomedical Informatics
Lab Andrea Bild's Lab
Street address 20 South 2030 East, Rm 420
City Salt Lake City
State/province UT
ZIP/Postal code 84121
Country USA
Platforms (1)
GPL9052 Illumina Genome Analyzer (Homo sapiens)
Samples (2)
GSM1536837 Patient tumor samples
GSM1697009 Patient normal samples
BioProject PRJNA266377

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE62944_01_27_15_TCGA_20_420_Clinical_Variables_7706_Samples.txt.gz 517.5 Kb (ftp)(http) TXT
GSE62944_01_27_15_TCGA_20_CancerType_Samples.txt.gz 55.8 Kb (ftp)(http) TXT
GSE62944_06_01_15_TCGA_24_548_Clinical_Variables_9264_Samples.txt.gz 1.2 Mb (ftp)(http) TXT
GSE62944_06_01_15_TCGA_24_CancerType_Samples.txt.gz 71.6 Kb (ftp)(http) TXT
GSE62944_06_01_15_TCGA_24_Normal_CancerType_Samples.txt.gz 5.5 Kb (ftp)(http) TXT
GSE62944_RAW.tar 5.9 Gb (http)(custom) TAR (of TXT)
Raw data not provided for this record
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap