My NCBI Sign In
Jump to: Authorized Access | Attribution | Authorized Requests

Study Description


The accurate identification of structural variations using whole-genome DNA sequencing data generated by next-generation sequencing technology is extremely difficult. To address this challenge, we have developed CREST, an algorithm that uses sequencing reads with partial alignments to the reference human genome (so-called soft-clipped reads) to directly map the breakpoints of somatic structural variations. We applied CREST to paired tumor/normal whole genome sequencing data from five cases of T-lineage acute lymphoblastic leukemia (T-ALL). A total of 110 somatic structural variants were identified, >80% of which were validated by genomic PCR and Sanger sequencing. The validated structural variants included 31 inter-chromosomal translocations, 19 intra-chromosomal translocations, one inversion, 22 deletions and 16 insertions. A comparison of the results generated with CREST to those obtained using the traditional paired-end discordant mapping methods demonstrate CREST to have a much higher sensitivity and specificity. In addition, application of CREST to publicly available whole-genome sequencing data from the human melanoma cancer cell line COLO-829 demonstrated the identification of 50 novel structural variations not detected using the standard methods, 20 of which were selected for validation with a 90% success rate. These data demonstrate that direct mapping of soft-clipped reads offers an improved method for detecting structural variants at the nucleotide level of resolution.


Early T-cell precursor acute lymphoblastic leukaemia (ETP ALL) is an aggressive malignancy of unknown genetic basis. We performed whole-genome sequencing of 12 ETP ALL cases and assessed the frequency of the identified somatic mutations in 94 T-cell acute lymphoblastic leukaemia cases. ETP ALL was characterized by activating mutations in genes regulating cytokine receptor and RAS signalling (67% of cases; NRAS, KRAS, FLT3, IL7R, JAK3, JAK1, SH2B3 and BRAF), inactivating lesions disrupting haematopoietic development (58%; GATA3, ETV6, RUNX1, IKZF1 and EP300) and histone-modifying genes (48%; EZH2, EED, SUZ12, SETD2 and EP300). We also identified new targets of recurrent mutation including DNM2, ECT2L and RELN. The mutational spectrum is similar to myeloid tumours, and moreover, the global transcriptional profile of ETP ALL was similar to that of normal and myeloid leukaemia haematopoietic stem cells. These findings suggest that addition of myeloid-directed therapies might improve the poor outcome of ETP ALL.

  • Study Types: Tumor vs. Matched-Normal, Case Set
  • Number of study subjects that have individual level data available through Authorized Access: 106

Authorized Access
Publicly Available Data (Public ftp)

Connect to the public download site. The site contains release notes and manifests. If available, the site also contains data dictionaries, variable summaries, documents, and truncated analyses.

Study Inclusion/Exclusion Criteria

Cases of early T cell progenitor acute lymphoblastic leukemia (ETP T-ALL). The cases in this cohort have had whole genome sequencing of tumor and normal DNA performed. The definition of ETP T-ALL can be found in Lancet Oncol. 2009 Feb;10(2):147-56. PMID: 19147408. Cases were selected that had appropriate consent for genetic studies, and suitable material for sequencing (high purity tumor populations and available normal DNA obtained at disease remission).

Study History

Whole genome sequence data from twelve cases of ETP T-ALL and sequence from matched normal samples. Five samples were used to develop and validate a novel algorithm CREST for the identification of structural variations from whole genome sequence data.

Selected publications
Diseases/Traits Related to Study (MESH terms)
Authorized Data Access Requests
Study Attribution