Genomic landscape of TP53-mutated myeloid malignancies

TP53-mutated myeloid malignancies are most frequently associated with complex cytogenetics. The presence of complex and extensive structural variants complicates detailed genomic analysis by conventional clinical techniques. We performed whole genome sequencing of 42 AML/MDS cases with paired normal tissue to characterize the genomic landscape of TP53-mutated myeloid malignancies. The vast majority of cases had multi-hit involvement at the TP53 genetic locus (94%), as well as aneuploidy and chromothripsis. Chromosomal patterns of aneuploidy differed significantly from TP53-mutated cancers arising in other tissues. Recurrent structural variants affected regions that include ETV6 on chr12p, RUNX1 on chr21, and NF1 on chr17q. Most notably for ETV6, transcript expression was low in cases of TP53-mutated myeloid malignancies both with and without structural rearrangements involving chromosome 12p. Telomeric content is increased in TP53-mutated AML/MDS compared other AML subtypes, and telomeric content was detected adjacent to interstitial regions of chromosomes. The genomic landscape of TP53-mutated myeloid malignancies reveals recurrent structural variants affecting key hematopoietic transcription factors and telomeric repeats that are generally not detected by panel sequencing or conventional cytogenetic analyses.

Following an initial run of the pipeline, purity estimates from Purple, followed by manual review (Supp Table 1), were used as input parameters to the GRIPSS filtering and downstream tools. SV were filtered for FILTER status=PASS and length>50bp. Linx was used for clustering complex variants. Fractional absolute copy number calls from Purple were rounded to the nearest integer value for classification as 'gain' or 'loss'. Copy-neutral loss-of-heterozygosity was called using the MinorAlleleCopyNumber estimate provided by Purple. All mutations affecting TP53 were manually reviewed.

Chromothripsis detection
Chromothripsis detection was performed using Shatterseek (https://github.com/parklab/ShatterSeek, commit 4b8b41011ecfe6d1496e906e5d9ec7d65467d476), using the filtered SV (DEL, DUP, INV, and BND types only) and copy number outputs from Purple as input, and default parameters for Shatterseek. Filtering for high-confidence chromothripsis regions was performed as recommended in the Shatterseek documentation, and was followed by manual review to exclude false-positive calls.

Estimation of telomere content
Telomere content was estimated using TelomereHunter (v1.1.0) [7] in tumor-normal mode, using default parameters, and Telseq (v0.0.1) [8] separately for tumor and paired normal samples. Estimates of telomere content from the two methods were highly correlated (R 2 =0.88), so we focused on just the results of TelomereHunter. Analyses of TVR (telomere variant repeats) in singleton context (i.e., flanked by at least 3 t-type telomeric hexamers to either side) were based on the per-sample tumor/normal ratio of normalized singleton read counts, as provided by TelomereHunter.

Identification of intrachromosomal telomeric insertions
Identification of intrachromosomal insertions of telomeric repeats was performed following the approach previously described [9]. Using the telomeric reads identified by TelomereHunter (i.e., with at least six t-type, c-type, g-type or j-type hexameric repeats), we identified reads such that only one member of the pair was classified as telomeric. We then identified candidate insertion regions as 1Kb windows containing 3 or more of these 'orphaned' telomeric reads in the tumor and none in the paired normal sample, excluding assembly gaps and the terminal cytoband of each chromosome. Within these candidate regions, we identified soft-clipped reads with mapping quality>30, excluding duplicates, secondary, and supplementary reads, where at least one t-type, c-type, g-type or j-type hexamer was present in the soft-clipped region. We identified all positions at the site of clipping in 4 or more reads from the tumor, followed by filtering of sites within segmental duplications or simple repeats, sites with the presence of soft-clipped telomeric repeats in the paired normal, and sites identified in more than 2 samples. Finally, all candidate insertions were manually reviewed to exclude false positives.

Copy Number Analysis of BeatAML cohort
Copy number analysis in the BeatAML cohort [10] was performed using all primary AML cases and all available normal controls. In instances where more than one sample per primary AML case was provided, we chose one sample at random, which preference for bone-marrow samples when available. Copy number analysis was performed using cnvkit (v0.9.8) [11] and with masking of assembly gaps and centromeric regions, using the full set of normal samples as a panel of normal, and according to the cnvkit authors' recommended workflow. In order to account for observed systematic noise due to, e.g., differences in GC-content between adjacent genomic bins, we re-centered the log2 copy number ratio in each bin by subtracting the median log2r ratio across all tumor samples, and then performed a second round of copy number segmentation.