Jump to: Authorized Access | Attribution | Authorized Requests

Substudies
phs000854.v2.p8 : Genome-wide Analysis of Noncoding Regulatory Mutations in Cancer

Study Description

The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. TCGA is a joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), which are both part of the National Institutes of Health, U.S. Department of Health and Human Services.

TCGA projects are organized by cancer type or subtype. Click here for a current list of cancer types selected for study in TCGA.

Data from TCGA (e.g., gene expression, copy number variation and clinical information), are available via the Genomic Data Commons (GDC).

Data from TCGA projects are organized into two tiers: Open Access and Controlled Access.

  • Open Access data tier contains data that cannot be attributed to an individual research participant. The Open Access data tier does not require user certification. Data in Open Access tier are available in the TCGA Data Portal.
  • Controlled Access data tier contains individual-level genotype data that are unique to an individual. Access to data in the Controlled Access data tier requires user certification through dbGaP Authorized Access.
  • Controlled Access data types consist of the following:
    • Individual germline variant data (SNP .cel files)
    • Primary sequence data (.bam files), which are available at GDC
    • Clinical free text fields
    • Exon Array files (for Glioblastoma and Ovarian projects only)

NOTE: TCGA strives to release most data in the open access tier. Individual genotype or sequence files are prominent exceptions. Commonly requested files such as descriptions of somatic mutations or clinical data are open access.

Please go to this page: https://tcga-data.nci.nih.gov/docs/publications/ to access all data associated with TCGA tumor specific publications.

The TCGA study is utilized in the following dbGaP substudies. To view genotypes and other molecular data collected in these substudies, please click on the following substudies below or in the "Substudies" section of this top-level study page phs000178 TCGA study.

  • phs000854 Genome-wide Analysis of Noncoding Regulatory Mutations in Cancer

  • Study Weblinks:
  • Study Type:
    • Tumor vs. Matched-Normal
  • Number of study subjects that have individual-level data available through Authorized Access:
Authorized Access
Publicly Available Data (Public ftp)

Connect to the public download site. The site contains release notes and manifests. The site also contains data dictionaries, variable summaries, documents, and truncated analyses, whenever available.

Study Inclusion/Exclusion Criteria

TCGA utilizes a strict set of criteria for inclusion into the study due to the rigorous and comprehensive nature of the work being performed. Tumor samples and matched source of germline DNA are curated and processed by the Biospecimen Core Resource, a centralized site that reviews sample data and processes all samples to ensure consistent pathology assessment and generation of molecular analytes (DNA and RNA).

TCGA is focusing on primary untreated tumors that were snap frozen upon collection. All tumors must have a matched normal sample from the same patient. In many cases, the matched normal is a sample of the patient's blood.

Once at the BCR, all samples are subjected to a quality control protocol before they are accepted for full analysis into the TCGA pipeline. Each sample is reviewed by a pathologist to confirm the diagnosis and that the sample meets inclusion criteria. Specifically, TCGA requires that samples contain at least 60% tumor nuclei and have less than 20% necrotic tissue. Once the sample passes the pathology review, nucleic acids are isolated and genotyping is performed so that each tumor sample is properly associated with the correct normal tissue. An important goal in establishing this central resource is to ensure that molecular analytes (i.e. DNA and RNA) extracted from tissue samples are of consistent and high quality. Next, these analytes, undergo a molecular quality control process and then are distributed to TCGA Cancer Genome Characterization Centers and Genome Sequencing Centers for genomic analysis.

All samples in TCGA have been collected and utilized following strict policies and guidelines for the protection of human subjects, informed consent and IRB review of protocols.

Molecular Data
TypeSourcePlatformNumber of Oligos/SNPsSNP Batch IdComment
Whole Genome Genotyping Affymetrix AFFY_6.0 934940 52074
Whole Genome Sequencing Illumina Genome Analyzer IIX N/A N/A
Whole Genome Sequencing Illumina HiSeq 2000 N/A N/A Indexed, 76bp, paired-end run
Whole Exome Sequencing Applied Biosystems SOLiD N/A N/A
Whole Exome Sequencing Roche 454 GS FLX Titanium N/A N/A
RNA Sequencing Illumina HiSeq 2000 N/A N/A Indexed, 76bp, paired-end run
RNA Sequencing Illumina Genome Analyzer IIX N/A N/A
Study History

For an updated history of the program, please see: http://cancergenome.nih.gov/abouttcga/overview/history.

Selected publications
Diseases/Traits Related to Study (MeSH terms)
Links to Related Resources
Authorized Data Access Requests
Study Attribution
Feedback