My NCBI Sign In
Jump to: Authorized Access | Attribution | Authorized Requests

Study Description

The accurate identification of structural variations using whole-genome DNA sequencing data generated by next-generation sequencing technology is extremely difficult. To address this challenge, we have developed CREST, an algorithm that uses sequencing reads with partial alignments to the reference human genome (so-called soft-clipped reads) to directly map the breakpoints of somatic structural variations. We applied CREST to paired tumor/normal whole genome sequencing data from five cases of T-lineage acute lymphoblastic leukemia (T-ALL). A total of 110 somatic structural variants were identified, >80% of which were validated by genomic PCR and Sanger sequencing. The validated structural variants included 31 inter-chromosomal translocations, 19 intra-chromosomal translocations, one inversion, 22 deletions and 16 insertions. A comparison of the results generated with CREST to those obtained using the traditional paired-end discordant mapping methods demonstrate CREST to have a much higher sensitivity and specificity. In addition, application of CREST to publicly available whole-genome sequencing data from the human melanoma cancer cell line COLO-829 demonstrated the identification of 50 novel structural variations not detected using the standard methods, 20 of which were selected for validation with a 90% success rate. These data demonstrate that direct mapping of soft-clipped reads offers an improved method for detecting structural variants at the nucleotide level of resolution.

  • Study Types: Whole Genome Sequencing, Tumor vs. Matched-Normal
  • Number of study subjects that have individual level data available through Authorized Access: 12

Authorized Access
Publicly Available Data (Public ftp)

Connect to the public download site. The site contains release notes and manifests. If available, the site also contains data dictionaries, variable summaries, documents, and truncated analyses.

Study Inclusion/Exclusion Criteria

Cases of early T cell progenitor acute lymphoblastic leukemia (ETP T-ALL). The cases in this cohort have had whole genome sequencing of tumor and normal DNA performed. The definition of ETP T-ALL can be found in Lancet Oncol. 2009 Feb;10(2):147-56. PMID: 19147408. Cases were selected that had appropriate consent for genetic studies, and suitable material for sequencing (high purity tumor populations and available normal DNA obtained at disease remission).

Molecular Data
TypeSourcePlatformNumber of Oligos/SNPsSNP Batch IdComment
Whole Genome Genotyping Affymetrix AFFY_6.0 934940 52074
Whole Genome Genotyping Affymetrix Mapping250K_Nsp 262264 33767 Affymetrix 500K Set comprises Mapping250K_Nsp and Mapping250K_Sty Arrays
Whole Genome Genotyping Affymetrix Mapping250K_Sty 238304 33766 Affymetrix 500K Set comprises Mapping250K_Nsp and Mapping250K_Sty Arrays
Study History

Whole genome sequence data from twelve cases of ETP T-ALL and sequence from matched normal samples. Five samples were used to develop and validate a novel algorithm CREST for the identification of structural variations from whole genome sequence data.

Selected publications
Diseases Related to Study (MESH terms)
Authorized Data Access Requests
Study Attribution
  • Principal Investigator
    • Charles Mullighan, MD. Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN, USA
  • Institute
    • St. Jude Children's Research Hospital, Memphis, TN, USA