My NCBI Sign In
Jump to: Authorized Access | Attribution | Authorized Requests

Study Description

The Office of Cancer Genomics at the National Cancer Institute is sponsoring a series of studies as part of the Cancer Genome Characterization Initiative (CGCI) to assess novel emerging sequencing technologies in cancer. The CGCI program includes comprehensive characterization of the genetic aberrations found in different pediatric and/or adult tumors.

CGCI is currently characterizing diffuse large B-cell lymphoma (DLBCL) and medulloblastoma (MB), with additional cancers to be characterized in the future. All data from these projects will be released into publicly accessible databases, with a majority of data in an open-access tier. A subset of data will be available only through a controlled-access tier due to patient privacy concerns.

Project Descriptions by Disease are as follows:

  • Diffuse Large B-Cell Lymphoma (DLBCL) - In combination, the lymphoid cancers (non-Hodgkin lymphoma (NHL), Hodgkin lymphoma, myeloma and chronic lymphocytic leukemia), constitute the fourth most common malignancy in both men and women in North America. Lymphomas typically have characteristic abnormal chromosomes, including translocations, indicating the relevance of mutations to how NHLs develop and behave. This project uses lymphoid neoplasms as the test platform to demonstrate that detailed mutational analysis associated with a specific well-characterized set of neoplasms can provide a candidate list of mutation sites specific to and common across lymphoma types. The resulting data set will facilitate studies of clinical behaviour, response to treatment, patient outcome and survival, and target pathways for therapeutic agents.

    CGCI investigators are probing genomic alterations more deeply than has been previously possible by using state-of-the-art whole transcriptome shotgun sequencing (WTSS) and whole genome shotgun sequencing (WGSS) coupled with leading edge bioinformatics, data management and analysis approaches. Specifically, next-generation sequencing technologies are used to survey diffuse large B-cell lymphoma (DLBCL) for somatic mutation. Fresh-frozen biopsy material and constitutional DNA is assembled from patients with DLBCL who are uniformly staged, treated and followed in British Columbia, Canada. To date the project has obtained 92 diffuse large B-cell lymphoma (DLBCL) samples of sufficient material, quality and consent for entry into the WTSS pipeline for library construction and sequencing. These 92 samples are a mixture of two major subtypes of DLBCL: germinal center B-cell (GCB) and activated B-cell (ABC).

  • Medulloblastoma (MB) - Medulloblastoma is the most common malignant brain tumor found in children. In order to identify the genetic alterations in MB, copy number alterations were sought using high-density microarrays and sequenced all known protein-coding genes and miRNA genes using Sanger sequencing in a set of 22 pediatric MB samples and one matched normal blood sample. All tumor samples were obtained at the time of original surgery (pre-treatment) except for one sample, which was obtained at the time of MB recurrence.

    Protein-encoding transcripts to be targeted for sequencing were derived from transcripts present in the Ensembl (file date 8/27/2008), RefSeq (file date 1/18/2009), and CCDS (file date 2/02/2009) databases and downloaded from the UCSC Genome Bioinformatics site. The protein encoding transcripts were supplemented with microRNA transcripts downloaded from the Sanger miRBase Sequence Database (Release 13.0) in order to yield a combined set of transcripts representing 24,893 genes (24,178 protein encoding and 715 microRNA). The regions of interest (ROIs) targeted for sequencing comprised the entire transcribed portion of the microRNA exons and the protein encoding portion plus 4 bases of flanking sequence for the protein encoding exons. Illumina Infinium II Whole Genome Genotyping Assay employing the BeadChip platform was used to analyze the same set of tumor samples at 1,199,187 (1M-Duo) SNP loci in order to detect copy number alterations in the same set of tumors.

The projects currently involved in CGCI will provide various data to include whole genomic, transcriptomic and mutational analyses of the tumor types being studied. This page will be amended as additional projects and characterization platforms are added to the CGCI portfolio.

Authorized Access
Publicly Available Data (Public ftp)

Connect to the public download site. The site contains release notes and manifests. If available, the site also contains data dictionaries, variable summaries, documents, and truncated analyses.

Study Inclusion/Exclusion Criteria

All adult specimens and all clinical and laboratory data gathered for this project meet the strict set of criteria established by The Cancer Genome Atlas (TCGA). In particular, the following specific criteria will be met.

  1. Focus on primary untreated tumors that were snap frozen upon tissue resection.
  2. All samples are collected and utilized following strict human subjects protection guidelines, informed consent and IRB reviewed protocols.
  3. Whenever possible, clinical data are gathered prospectively and stored in continuously updated electronic format using a standard relational database (MS Access) employing caDSR compliant terminology and from which the data can be easily exported.

Additional information on specimen inclusion and exclusion criteria for the specific tumor types investigated as part of CGCI can be found on the CGCI website and within referenced publications for this initiative.

Molecular Data
TypeSourcePlatformNumber of Oligos/SNPsSNP Batch IdComment
Whole Genome Sequencing Illumina Genome Analyzer II N/A N/A
Study History

Cancer is a genetic disease. Alterations at the DNA level drive the cellular changes that are hallmarks of cancer including aberrant cell division and survival. Historically, genetic causes of cancer were studied by analysis of one or a few genes at a time. More recently however, novel high-throughput technologies have provided unprecedented capabilities to examine the cancer genome. These technologies allow systematic characterization of genetic and epigenetic alterations, allowing investigators to identify the underlying genetic changes found in cancer. The CGCI incorporates multiple approaches for genomic characterization including exome sequencing and transcriptome analysis using next generation sequencing. To encourage collaboration and leverage the collective knowledge and innovation of the entire cancer research community, all data collected will be publicly available through databases supported by the National Institutes of Health and National Cancer Institute.

For DLBCL, sample acquisition, library construction and sequencing of all 92 cases for transcriptome analysis began July 1, 2008 and is now complete. Validation of candidate mutations is on-going. The first 30 DLBCL WTSS data were uploaded to the SRA on December 15, 2010 (these datasets were included in the referenced Nature Genetics publication). WGSS for a subset of these cases (40 in total) is currently in progress as part of an evaluation of the most recent advances in sequencing technologies.

Selected publications
Diseases/Traits Related to Study (MESH terms)
Authorized Data Access Requests
Study Attribution