Characterization of aberrant splicing in pediatric central nervous system tumors reveals CLK1 as a candidate oncogenic dependency

Summary Pediatric brain cancer is the leading cause of disease-related mortality in children, and many aggressive tumors still lack effective treatment strategies. Despite extensive studies characterizing these tumor genomes, alternative transcriptional splicing patterns remain underexplored. Here, we systematically characterized aberrant alternative splicing across pediatric brain tumors, identifying pediatric high-grade gliomas (HGGs) among the most heterogeneous. Through integration with UniProt Knowledgebase annotations, we identified 12,145 splice events in 5,424 genes, leading to functional changes in protein activation, folding, and localization. We discovered that the master splicing factor and cell-cycle modulator, CDC-like kinase 1 (CLK1), is aberrantly spliced in HGGs to include exon 4, resulting in a gain of two phosphorylation sites and subsequent activation of CLK1. Inhibition of CLK1 with Cirtuvivint in the pediatric HGG KNS-42 cell line significantly decreased both cell viability and proliferation in a dose-dependent manner. Morpholino-mediated depletion of CLK1 exon 4 splicing reduced RNA expression, protein abundance, and cell viability. Notably, KNS-42 cells treated with the CLK1 exon 4 morpholino demonstrated differential expression 78 genes and differential splicing with loss or gain of a functional site in 193 genes annotated as oncogene or tumor suppressor genes (TSGs). These genes were enriched for cancer-associated pathways, with 20 identified as significant gene dependencies in pediatric HGGs. Our findings highlight a dependency of pediatric HGGs on CLK1 and its roles contributing to tumor splicing heterogeneity through transcriptional dysregulation of splicing factors and transcriptional modulation of oncogenes. Overall, aberrant splicing in HGGs and other pediatric brain tumors represents a potentially targetable oncogenic pathway contributing to tumor growth and maintenance.


Introduction
Pediatric brain cancer is the number one cause of disease-related death in children 1 .
Furthermore, pediatric high-grade gliomas (HGGs) are largely resistant to chemotherapy and are often surgically unresectable, making them exceptionally challenging 2 .Despite decades of clinical trials in pediatrics, patients with certain tumor types such as diffuse intrinsic pontine glioma (DIPG) or diffuse midline glioma (DMG) will succumb to their disease, with a median overall survival of 9.73 months in patients undergoing surgical biopsy 3 .Even with optimal multimodal therapy, median overall survival for non-DMG patients with a HGG is 14-20 months 4 .
Within the last decade, surgical biopsies and post-mortem tissue collection has enabled genomic profiling and somatic characterization of pediatric HGGs.While somatic drivers such as mutations, fusions, and copy number amplifications and/or deletions have been wellcharacterized, the splicing landscape of pediatric HGGs remains underexplored.Two reports have shown rare, but mutually exclusive, alterations in spliceosome-related factors such as SF3B1 and SF3B2 in pediatric HGGs.These alterations dysregulated additional cellular processes involved in maintenance of DNA replication, genome integrity, or transcriptional fidelity 5,6 .More recently, Siddaway, et.al. describe alternative aberrant splicing as a novel mechanism of oncogenic pathway activation in pediatric HGGs.In this study, cancer drivers, including members of the RAS/MAPK pathway, were differentially spliced leading to the activation of the RAS/MAPK pathway and worse glioblastoma patient survival 7 .Alternative splicing plays critical roles in numerous fundamental biological processes, for example, increasing proteomic diversity, stabilizing or destabilizing mRNA transcripts, lowering protein steady-state levels, and influencing critical protein functionality such as enzymatic activity, protein folding, and localization 8,9 .Splicing plays a vital role in generating tissue-specific transcriptomes.The mammalian brain expresses the most complex and conserved alternative splicing programs in relation to other tissues and their disruption can result in a variety of neurological diseases and disorders 10,11 .The splicing process is regulated by a balance of multiple trans-acting RNA-binding proteins (RBP), such as the Serine-rich Splicing Factors (SRSFs), that bind to splicing regulatory elements across preliminary mRNA transcripts.
Disturbing that balance can be detrimental to cellular functionality, and has been associated with oncogenic transformation 12,13 .
Here, we perform a large-scale analysis of genome-wide splicing across pediatric CNS tumors and show widespread dysregulation of pre-mRNA alternative splicing across pediatric CNS tumors.Our findings reveal that alternative splicing can result in the removal or addition of known functional sites predicted to have significant downstream effects on various cellular functions potentially leading to cell viability and cell-cycle dysregulation.We identify and experimentally characterize CLK1 as a candidate oncogenic dependency in pediatric HGGs and propose that targeting of aberrant splicing and/or its resulting downstream proteins may offer additional therapeutic avenues for precision cancer therapy.

Pediatric brain tumors display heterogeneous global patterns of aberrant splicing
We utilized replicate Multivariate Analysis of Transcript splicing (rMATs) 14,15 to quantify transcriptome-wide alternative splicing events across primary pediatric brain tumors (N = 1,434) from the Open Pediatric Cancer (OpenPedCan) project 16 .These broad tumor types shown in  S1.We examined four types of splicing events (single exon: SE, alternative 5' splice site: A5SS, alternative 3' splice site: A3SS, and retained intron: RI) and observed that SE splicing is overwhelmingly the most frequently observed (Figure 1B) for both increased inclusion and skipping events, consistent with a previous report in pediatric HGGs 7 .Importantly, we did not observe an effect of RNA library on percent spliced in (PSI) predictions and found that the sample aliquots sequenced with a poly-A or stranded library had strong positive correlations for genome-wide PSI values (Pearson's R = 0.98, p-value < 2.2e-16, Figure S1A-B).
Due to the diverse biological drivers of these CNS cancers and their molecular subtypes, we hypothesized that we might observe histology-or subtype-specific splicing patterns.Indeed, our assessment of recurrent exon skipping (ES) or exon inclusion (EI) events (N ≥ 2) revealed histology-specific splicing events in some tumor types, shared events across some histologies, as well as lack of recurrent events in other histologies when focusing on the top 40 sets that can comprise one histology or multiple histologies (Fig 1C, Figure S1B).Within these histologies, we found that MB, LGG, and HGG tumors exhibited the highest number of recurrent, unique skipping events (MB = 2,644, LGG = 1,860, other HGG = 1,610, DMG = 687) and inclusion events (MB = 1,235, LGG = 814, other HGG = 677).To ensure this observation was not skewed by the number of patient tumors per histology grouping, we normalized these events by the number of patients.We observed that MB and HGG tumors had the highest total number of unique, recurrent single exon events per patient (Fig 1D).We plotted the top 40 sets with a complete list of unique events per histology reported in Table S2A-B.Taken together, these findings suggest that there are both tumor type-specific and shared splicing patterns in pediatric brain tumors.
We further investigated single exon-associated events and devised a metric called the splicing burden index (SBI) to compute the proportion of differential alternative splicing (AS) events in each sample compared to all other tumor samples (Figure 1E, see Methods).This metric allows for transcriptome-wide assessment of differential AS within a tumor sample.The median SBI across this cohort was 0.029 (or 2.9%).LGG tumors had the lowest median SBI (0.0150, 1.5%), while germ cell tumors had the highest (0.0802, 8.02%).Tumors with a low splicing burden variance (< 1st quantile variance across all tumors, or 0.0015561) include other tumors, other CNS embryonal tumors, meningiomas, choroid plexus tumors and schwannomas, while more heterogeneous tumors (splicing burden variance > 0.0022047) include DIPG or DMG, MB, mesenchymal tumors, neurofibroma plexiform, and germ cell tumors.We performed a similar analysis on the other splicing cases (A5SS, A3SS, and RI) and observed that LGGs and schwannomas continued to exhibit the lowest median SBI across splice types, while other tumors and germ cell tumors (GCTs) maintained the highest median SBI.However, the ordering of tumors shifted in some instances, suggesting that certain types of splice events may be more prevalent in some histologies compared to others.For example, DIPG or DMG tumors changed from having a low median SBI for SE events to a higher median SBI for other event types, suggesting an increased splicing burden for RI, A5SS, and A3SS splicing changes (Figures S1D-F).We next assessed the proportion of low and high SBI tumors in the cohort by histology.
The majority (> 70%, N = 156/213) of LGGs consisted of tumors with low SBI, while the majority of germ cell tumors (N = 11/12) had a high SBI (Figure 1F).Moreover, greater than 25% of tumors within the following histologies had a high SBI: other HGG, MB, ATRT, CPG, other tumors, non-neoplastic tumors, GCT, mesenchymal tumors, and neurofibroma plexiform while greater than 25% of tumors within the following histologies had a low SBI: LGG, GNT, DIPG or DMG, and CPG.These results further highlight the heterogeneous splicing landscape across pediatric CNS tumors.
We next hypothesized that tumors with a low tumor mutation burden (TMB) might have a higher splicing burden index as an alternate mechanism driving tumorigenesis.Interestingly, we did not find an overall correlative relationship between TMB and SBI when analyzing the full CNS tumor cohort together (all tumors: Pearson's R = 0.037, p-value = 0.19 and with hypermutant tumors removed: R = 0.018, p-value = 0.53, Figure S1G-H), but when assessing the relationship by histology (Figure S1I), we found significant negative correlations between TMB and SBI in LGG (Pearson's R = -0.15,p-value = 0.0059), DIPG or DMG (Pearson's R = -0.24,p-value = 0.025), and Schwannomas (Pearson's R = -0.77,p-value = 0.00018).As an alternate approach, we asked whether TMB was different between tumors with high and low SBI within histologies (Figure 1F).We found a significant inverse relationship between SBI and TMB in only CPTs, LGGs, GNTs, and schwannomas (Wilcoxon p-value < 0.05), indicating that aberrant splicing may serve as an alternative reservoir for uncovering oncogenic mechanisms in mutationally silent tumors.

Splicing drives novel biological clusters and splicing burden differentiates key splicing factors in pediatric high-grade gliomas
To determine whether CNS tumors share transcriptional splicing biology, we performed consensus clustering of PSI values across all primary CNS tumors using the Partition Around Medoids algorithm with Canberra distance metric, which revealed 12 clusters spanning histologies (Figure 2A).A full list of parameters and statistics are detailed in Table S2C.
Clusters 1 and 12 were predominantly composed of MB (Figure 2B) with both containing all four MB subtypes: SHH, WNT, Group 3, and Group 4 (Figure S2C), suggesting unique splicedriven biological underpinnings even within molecularly-defined subtypes.Although Cluster 11   was dominated by EPNs (Figure 2B), it comprised all EPN molecular subtypes (Figure S2C).
While Cluster 4 contained the majority of LGGs in the cohort, LGGs spanned nine of 12 clusters (Figure S2C) and clusters were not subtype-specific.Notably, HGGs, including DIPG or DMG, exhibited the highest degree of splicing heterogeneity, spanning across all 12 clusters (Figure 2B and S2C).Taken together, the transcriptional mechanisms underlying these clusters' formation are not solely driven by molecular subtype.A full list of samples with associated cluster membership information is outlined in Table S2D.We further assessed cluster membership of tumors with high or low SBI and found that tumors with high SBI were present in ten clusters, while the low SBI tumors grouped almost entirely with Cluster 4, followed by Clusters 3, 5, and 11 (Figure 2C).We aimed to understand the biology driving cluster formation and used gene set variation analysis (GSVA) to identify enriched cancer-associated signaling pathways represented by splice events in each of the 12 clusters.Figure 2D displays the top pathways differentially-regulated between pairs of clusters (Bonferroni-adjusted p-value < 0.05).
Strikingly, the spliceosome pathway was significantly up-or down-regulated in all cluster groups when performing pairwise comparisons to all other groups (Bonferroni-adjusted p-value < 1.28 e- 46 -1.01 e-4 ).Further, we show a significant positive correlation (Figure 2E, Pearson's R = 0.55, p-value < 2.2e-16) between GSVA spliceosome pathway scores and SBI, both validating the use of the SBI metric to measure splicing activity and suggesting that splicing factors themselves are mis-spliced, likely contributing to the transcriptome-wide differential splicing we are observing within pediatric brain tumors.Other pathways that were enriched for certain clusters included DNA repair, mitotic spindle, and KRAS signaling.For example, Cluster 1 was dominated by MB subtype Group 4 (Figure S2C), which also showed significant dysregulation in the KRAS signaling pathway, corroborating early characterization of these tumor types 17 .
These results reveal that although we observe transcriptome-wide splicing variation in all tumors, distinct pathways and genes are targeted in each cluster.Taken together, we show that each cluster may have unique transcriptional underpinnings influencing distinct pathways which may, in turn, contribute to tumorigenesis.
Given their cluster and SBI heterogeneity and high number of unique, recurrent splice events, we narrowed our focus to all pediatric HGGs.Since we sought to understand the mechanisms underlying the widespread and pervasive splicing, we first assessed somatic alterations in splicing factor-encoding genes or splicing regulators in our cohort.Interestingly, a recent splicing study found that 34% of pediatric HGGs had a somatic alteration, mutation and/or CNV, in a gene in the HUGO spliceosome complex 7 .Since few of these genes overlap with the splicing factors and their canonical regulators assessed in our study, we combined our gene lists and investigated somatic alterations in these genes.We removed hypermutant tumors (≥ 10 Mut/Mb), filtered for putative deleterious mutations (defined by SIFT and/or PolyPhen), and found that 128 HGG tumors from 195 patients (65%) harbored at least one somatic SNVs/InDel, fusion, or CNV in a gene in the spliceosome complex or that regulates alternative splicing (Table S3A, Supplemental Figure S2D).However, the mutation frequencies in each gene only ranged from 1-8%, and there was no significant enrichment in any gene based on SBI status.It has been previously shown that in the absence of splicing factor gene mutations, RNA expression changes in these genes can cause downstream splicing changes to promote tumor formation [18][19][20] .Thus, we performed differential gene expression (DE)   analysis between high vs low SBI HGGs for known splicing factors and related genes 21 (Figure 2F) and found 44.3% (N = 77/174) to be significantly differentially expressed (adjusted p-value < 0.05 and log 2 -fold change > |2|, Table S2).Specifically, 57% (16/28) genes encoding the serine/arginine-rich splicing factor (SRSF) and heterogeneous nuclear ribonucleoproteins (hnRNP) families of trans-acting splicing factors known to directly influence exon-associated splicing were significantly DE between high vs low SBI HGGs.(Figure 2G).Since changes in gene expression may not necessarily result in corresponding changes at the protein level, this finding prompted us to investigate proteomic alterations in these splicing factors.We integrated gene expression and proteogenomic (N = 188) data from pediatric brain tumors obtained from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) 22 .We observed that mRNA and protein expression for these splicing factors are tightly correlated across pediatric CNS tumors (Figure 2H), supporting previous reports that differential mRNAs are better correlates to protein levels 23 .As such, mRNA levels of these splicing factor genes can be used as surrogate measurements for protein abundance and function.

Recurrent splicing aberrations alter known proteomic functional sites, including the gain of phosphorylation binding sites in splicing regulator CLK1
In order to further elucidate the aberrant splicing landscape of pediatric HGGs (N = 343), we developed a robust and adaptable workflow to prioritize recurrent single exon events with predicted functional impact (Figure 3A).We first applied a threshold of ≥ 10 junction read counts to identify expressed splice events.Next, we prioritized histology-specific recurrent (N ≥ 2) events that were differentially spliced (ΔPSI z-score > |2|) in a sample compared to the whole cohort.We identified a total of 56,550 recurrent differential splicing events in HGGs.
Subsequently, we annotated these events for overlap with known Uniprot functional sites and prioritized those leading to gain or loss of a functional site affecting disulfide bonding, localization signaling, amino acid modifications, or others.The Uniprot annotated "other" category is user-defined, but includes sites for ion-binding, calcium binding, PDZ-binding-motif and more 24 .This reduced the number of prioritized splice events to 12,145 events in 5,424 genes with a putative functional effect.Among these predicted functional spliced sites illustrated in Figure 3B, the majority favored increased exon inclusion (N = 12,337), with smaller subsets favoring increased exon skipping (N = 2,380) or displaying a mixed pattern (N = 662).Shown in  S4.The remaining splicing events were associated with un-annotated sites, untranslated, and/or noncoding regions.To identify potentially targetable events, we selected functional splice events in kinases and performed over-representation pathway analysis which revealed MAPK, ERBB, and PI3K-AKT MTOR as the top cancer-related pathways significantly over-represented (Bonferroni-adjusted p-value < 0.05, Figure S3B-C).
Strikingly, the gene encoding protein kinase CDC Like Kinase 1 (CLK1), an oncogenic factor and known master modulator of alternative splicing 12 , was amongst this subset of differentially spliced kinase genes in HGGs (Figure 3D).The majority of tumors showed very high levels of CLK1 exon 4 inclusion (Figure 3E) with mean PSI of 0.7657 (or 76.57%) and thus the inclusion event was not differential in most HGGs.In contrast, 12 tumors demonstrated significant skipping (decreased inclusion) of exon 4 (ΔPSI <= -2 z-scores of mean PSI), thus driving the differential splicing observed for CLK1 exon 4 (Figure 3D).Additionally, we observed CLK1 exon 4 associated transcript expression heterogeneity across non-tumor brain controls from the Genotype Tissue Expression Project (Figure S3C).CLK1 regulates the SR (Serine aRginine) family of splicing factor proteins through hyper-phosphorylation of the SR-rich peptide regions of SR proteins to induce cooperative RNA binding and increased activity [25][26][27] .Moreover, the differential splicing of CLK1 resulted in differences of exon usage across HGGs (Figure 3E-F).CLK1 exon 4 contains two catalytic sites, Thr138 and Ser140 (Figure 3G), and these have been described previously to be associated with increased protein abundance 26 .CLK1 exon 4 inclusion was significantly positively correlated with expression of total CLK1 mRNA (Pearson's R = 0.29, p-value = 4.1e-5, Figure 3H), supporting the hypothesis that inclusion of these phosphorylation sites in exon 4 increases canonical CLK1 expression.Further, CLK1 expression is significantly positively correlated with expression of Serine/Arginine-rich proteinspecific kinase 1 (SRPK1), a kinase that cooperates with SR-bound CLK1 to facilitate SR phosphorylation, U1 exchange of CLK1, and subsequent splicing 28 (Figure 3I, Pearson's R = 0.69, p-value < 2.2e-16).High CLK1 exon 4 inclusion was not unique to HGGs though they were the most heterogeneous.Indeed, we observed widespread high median CLK1 exon 4 inclusion levels across pediatric brain tumors, suggesting these tumors contain active CLK1 (Figure 3J).

CLK1 is an oncogenic dependency in pediatric HGGs
We sought to further examine the role of CLK1 as a potential oncogene in HGGs.We investigated the cancer Dependency Map (DepMap) portal and database and found that CNS and brain tumor cell lines with high expression of the exon 4 included transcript of CLK1 (≥ third quantile mRNA expression of ENST00000321356) have significantly higher CRISPR dependency (lower scores) compared to CLK1 low expressing cell lines (≤ first quantile) (Wilcoxon p = 0.034, Figure 4A).This observation was significant and unique only to cell lines derived from CNS tumor and myeloid malignancies (Figure S4A), suggesting tissue-and tumorspecific regulation of CLK1.Across all DepMap profiled cell lines, we found that the pediatric brain tumor cell line KNS-42 had a strong dependency on CLK1 (low CRISPR dependency score) (Figure 4B) and chose it for further in vitro testing.We next tested the impact of CLK1 inhibition in KNS-42 cells using the pan-Dyrk/Clk inhibitor Cirtuvivint (SM08502) 29 .Using the IncuCyte to monitor real-time proliferation, we observed a significant reduction in cell growth at multiple concentrations over a 6-day period (Figure 4C).Additionally, we observed a dosedependent decrease in cell viability using CellTiter-Glo at three days (Figure 4D) and six days (Figure S4B) post-treatment of 0.5, 1, 5, and 10 μM Cirtuvivint.
Based on these findings and to rule out off-target effects of the pan-inhibitor, we selected KNS-42 along with two additional cell lines from our pediatric brain tumor cohort with high CLK1 exon 4 PSI (7316-1763 and 7316-1769) to experimentally validate the exon 4 splice event identified from short-read RNA-Seq.We performed long-read RNA-seq using Oxford Nanopore Technologies (ONT) and validated the presence of two major full-length CLK1 mRNA isoforms that either included or skipped exon 4 across these three patient-derived cell lines (Figure 4E).
We therefore postulated that the gain of CLK1 phosphorylation sites on exon 4 increases mRNA and subsequent protein production in HGGs.To directly test this hypothesis, we modulated CLK1 exon 4 splicing using targeted morpholino oligomers (see Methods), in which we forced exon 4 skipping in the KNS-42 cell line.We performed qRT-PCR and observed a near total loss of the CLK1 exon 4 inclusion transcript at both 5 and 10 μM of exon 4 targeted morpholino, evidenced by reduced expression of the exon 3-4 junction.Likewise, at these same concentrations, we observed increased CLK1 exon 4 skipping using primers targeting the exon 3-5 junction (Figure 4F).Importantly, forced CLK1 exon 4 skipping resulted in ablation of CLK1 protein at 5 and 10 μM (Figure 4G), corroborating previous work that CLK1 exon 4 is required for full-length and catalytically active CLK1 [30][31][32] .Next, we assessed the functional impact of CLK1 exon 4 splicing using CellTiter-Glo and confirmed that cells with high CLK1 exon 4 skipping (CLK1 exon 4 targeting morpholino) exhibited significantly decreased viability compared to those with CLK1 exon 4 inclusion (non-targeting morpholino) at 24, 72, and 96 hours (p ≤ 0.01, within-time Student's t-test, Figure 4H).Taken together, we demonstrate that CLK1 is a dependency in pediatric HGGs required for cellular growth and viability and CLK1 mRNA and protein is maintained through increased exon 4 inclusion.
To identify bona-fide CLK1 targets mediated by exon 4 splicing, we performed RNA-seq from KNS-42 cells treated with morpholino oligomers (N = 3 controls, N = 3 targeted to skip exon 4).We performed differential gene expression (DE) analysis and identified 296 genes with differential expression (193 upregulated, 103 downregulated) between the treated and untreated populations (Figure 4I, Table S5A).Next, we quantified differential alternative splicing (Methods, Table S5B) and applied the same downstream computational workflow from Figure S3A to prioritize splice events affecting functional sites.We identified a total of 2,006 unique differential splicing (DS) events within 1,467 genes predicted to alter functional sites (SE = 1,905, A5SS = 196, A3SS = 272, and RI = 388, Figure 4J and Table S5C-F).These dysregulated genes included TSGs and oncogenes involved in RNA-binding, epigenetics, transcription factors, and kinases (Figures 4K-L and Table S5G).These genes were overrepresented in G2M checkpoint, mitotic spindle, and nucleotide excision repair pathways (Figure 4M).To further investigate the impact on DNA repair and other pathways, we performed gene-set enrichment analyses of DNA repair and cancer signaling pathways on these DS oncogenes and TSGs and found that depletion of CLK1 leads to upregulation of TNFA, PI3K/AKT/MTOR, IL6/JAK/STAT3, and apoptosis pathway expression and downregulation of multiple DNA repair pathways (Figure S4C-F).Moreover, of the cancer genes with putative functional consequences driven by CLK1 splicing, we discovered that 2.3% (N = 6) had dysregulation at the level of both splicing and expression (Figure 4N), indicating these may impact the tumor's proteome.The DE genes were significantly over-represented (Bonferroniadjusted p-value <0.05) for KRAS upregulation and drug metabolism pathways (Figure S5A) while DE genes were significantly over-represented (Figure S5B) in G2M checkpoint, mitotic spindle, and nucleotide excision repair pathways, suggesting a potential for these events to impact cellular functions, contribute to the cancer disease state, and/or play a role in regulatory mechanisms of gene expression.
Finally, we asked whether CLK1 splicing affects any of the essential oncogenes defined by the pediatric gene dependency maps of the Childhood Cancer Model Atlas 33 .We observed 15 of these genes also exhibit significant gene dependencies (GD) in established pediatric HGG cell lines (Figure 4O-P, Table S5B), including CDK4, FGFR1, FGFR2, EZH2, RAF1, and SRC.
For instance, the expression levels of mRNAs encoding proto-oncogene SRC 34,35 are higher in cells with high CLK1 exon 4 (non-targeting morpholino), indicating that CLK1 may enhance or promote SRC expression.The differential splicing cases are more complex as they affect multiple transcripts, but taken together, these data suggest that transcript-level changes mediated through CLK1 could be contributing to some of these dependencies, particularly given the association of aberrant splicing with cancer progression [36][37][38] .

Discussion
Pediatric brain cancer remains the leading cause of disease-related mortality in children, and HGGs present formidable challenges due to their resistance to chemotherapy and surgical limitations.In this study, we conducted a large-scale analysis of aberrant alternative splicing across pediatric CNS tumors, revealing widespread dysregulation of pre-mRNA alternative splicing.We developed an analytical framework to prioritize and predict the consequences of splicing events.Our study demarcates intricate splicing patterns across various tumor types and introduces the splicing burden index (SBI) as a novel metric to quantify differential splicing events at the sample level without requiring a normal control.We discovered significant negative correlations between tumor mutation burden (TMB) and SBI in CPT, LGG, GNT, and schwannoma, suggesting that aberrant splicing may serve as a compensatory mechanism for tumorigenesis in these mutationally silent tumors.Further analysis of skipped exon events unveiled novel biological clusters driven by splicing variations across histologies and molecular subtypes, highlighting extensive splicing heterogeneity in pediatric HGGs.Of note, molecular subtypes were not cluster-specific and specifically for MB, this differs from a previous report which showed that subgroups WNT, SHH, Group 3, and Group 4 can be clustered using splicing information 39 .This is due to fundamental differences in clustering procedure and study goals.We performed unsupervised clustering to identify novel groupings, whereas Dubuc and Morrissy, et.al performed supervised clustering following identification of differentially expressed splice events among the four subgroups in order to classify groups using splice events 39 .Utilizing the UniProt Knowledgebase, we identified splice variants in HGGs that alter functional sites, potentially impacting protein functions such as activation, folding, and localization.
A key finding was differential splicing of CDC-like kinase 1 (CLK1), a critical splicing factor and cell-cycle modulator in pediatric HGGs.This splicing event led to the inclusion of phosphorylation sites in exon 4, promoting increased protein abundance.Experimental modulation of CLK1, either through inhibition or morpholino-directed exon 4 depletion in the KNS-42 cell line, resulted in significantly reduced cell proliferation and/or viability.Splicing modulation to deplete exon 4 ablated CLK1 RNA and protein levels, altogether supporting CLK1 as a gene dependency in pediatric HGGs.Additionally, we identified transcriptional dysregulation of essential cancer genes mediated by aberrant CLK1 splicing.CLK1 is currently being targeted therapeutically with the Pan-Clk/Dyrk Inhibitor Cirtuvivint (SM08502) in heme malignancies 29 and non-CNS solid tumors such as castrate-resistant prostate cancer, colorectal cancer, and non-small cell lung cancer [40][41][42] .Here, our study suggests that CLK1 may also represent a therapeutic vulnerability in CNS malignancies including pediatric HGGs.This investigation enhances our understanding of the splicing landscape in pediatric brain tumors and proposes that aberrant splicing may be a viable target for therapeutic intervention.Further, we openly share the splicing data for all pediatric CNS tumors and believe this can be a valuable resource for the oncology community.Our approach to characterizing splicing aberrations and their functional consequences paves the way for future research into mRNA splicing-based mechanisms of tumorigenesis, the identification and development of therapies targeting aberrant splice events, and may even guide splicing-based diagnostics, all of which have the potential to improve the therapeutic landscape for pediatric brain cancers.

Limitations of the study
In this study, splicing quantifications were primarily performed using short-read RNA-Seq technology, which limits the interpretation of the full spectrum of splicing variation, particularly larger multi-exon transcripts or with genes that contain a high number of transcripts.While there are proteomics data for over 200 matched pediatric brain tumors publicly available, we were limited in sample size for DIPG or DMG and other HGGs, so it will be important to validate our findings in larger datasets as they become available.Additionally, the lack of pediatric normal tissue RNA-Seq necessitated use of non-tumor controls restricted to adult samples from GTEx, and may not necessarily represent the splicing landscape of pediatric tissues.Further, GTEx does not contain tissue of origin for all pediatric brain tumors (eg: pons for DIPG or brainstem for DMG), further limiting the comparison.Additionally, within histologies (eg: LGG), the primary site of the tumor can vary widely depending on diagnosis and it would be ideal to match each tumor to a normal one by one.Although this is not yet possible with the normal RNA-Seq available, the creation of the upcoming developmental GTEx will be critical in the future.We mitigated these normal tissue limitations through the use of the SBI metric, in which we compared each tumor to every other tumor in the cohort.Finally, this RNA-Seq cohort contained samples with many different library preparation strategies (poly-A, stranded, exome capture, stranded poly-A), which ultimately leads to batch effects in certain expression analyses.Using two samples sequenced by different library preparation strategies, we demonstrated that SBI was largely unaffected by library type (Figure S1A-B).Therefore, to mitigate batch effects with gene expression measurements, we used the entire pooled cohort for splicing analyses (PSI, SBI) but used only stranded samples in other analyses involving gene expression values such as correlations, differential expression, and/or over-representation analyses.Overall, we employed robust statistical techniques, cross-validated our findings with external datasets, and utilized orthogonal approaches and experimental methods where possible.

Primary data analyses
Somatic primary workflows were implemented by the Kids First Data Resource Center as described in the Open Pediatric Brain Tumor Atlas (OpenPBTA) 43 and OpenPedCan 16 projects.The code for these workflows, including RNA-seq quantification, fusion identification, RNA splicing, and SNV, INDEL, CNV, SV calling, can be found at https://github.com/d3bcenter/OpenPedCan-workflows.Sample-level data can be found through the Kids First Portal at https://kidsfirstdrc.org/.

Splicing identification and quantification
To detect alternative splicing, we ran rMATS turbo (v.4.1.0) 14with GENCODE v39 GFF annotations, as described by the Kids First RNA-Seq workflow (https://github.com/d3bcenter/OpenPedCan-workflows).We filtered for alternative splicing events with ≥ 10 junction read counts.These results were then used for all downstream processing throughout the manuscript.

Merged primary and summary data
Merged primary matrices and summary files utilized in this manuscript were derived from are openly accessible via the download script in the https://github.com/d3b-center/pbta-splicingrepository.To compare RNA-Seq from CLK1 exon 4 morpholino-treated cells vs control morpholino-treated cells, we ran rMATs with three biological replicates for each condition `--b1 -b2`.This paired mode analysis calculated ΔPSI, p-values, and FDR statistics for each splice event.

Splicing burden index (SBI) calculation
The following describes the SBI calculation used in the manuscript.
Let X be the list of all samples, where X represents the j-th item in the i-th sample.
Let n be the number of items in each sample.
Let SE be the splice event of interest.
Let SE be the number of splice events in the i-th sample.
Let mean SE be the mean of the splice event across all samples.
Let σ SE be the standard deviation of the splice event across all samples.
Let SBI be the proportion of splice events that have z-scores > |2| out of the total number of splice events in a particular sample.
Then the equation for SBI is: where i = 1 to len(X) and j = 1 to n We compared PSI values of each primary tumor against all other tumors in the cohort.
We first computed mean and standard deviation metrics for each alternative splicing event observed in at least one sample.Then for each sample in each group or histology, we identified the proportion of genes that underwent aberrant splicing as defined by a z-score > |2| across the entire transcriptome that undergoes alternative splicing.We first preprocessed the splicing PSI matrix, restricting it to one splice event per gene by choosing the splice event with the maximum PSI value in a given gene.To reduce the dimensionality of the input matrix, we applied a feature selection using Hartigans' dip test 44 .This test identifies dips in the distribution of input features and selects features that have a bi-or multi-modal distribution across the input samples.These "dips" in the distribution may correspond to differences within underlying clinical variables of interest.A total of 6999 features passed the test for multi-modality (p-value < 0.05) and were used for downstream clustering.

Consensus
Next, we applied all combinations of the following clustering algorithms (PAM, K-means, and Hierarchical) and distance methods (Pearson, Spearman, Euclidean, Manhattan, Binary, Maximum, Canberra, and Minkowski) available in the R package ConsensusClusterPlus 45 .For each combination, we evaluated a minimum k value of 2 and a maximum k value of 17.This resulted in a total of 272 clustering solutions corresponding to the different input combinations.
To identify the optimal clustering solution, we first evaluated the cluster performance using the R package fpc 46 .Using a given input data matrix and clustering solution, the function `fpc::clusterstats` computes the metrics silhouette width, entropy, purity, and Dunn index that represent separation between different clusters and closeness of data points within a cluster.
The silhouette score defines the compactness of individual clusters (intra-cluster distance) and separation amongst clusters (inter-cluster distance) to measure an overall representative score.
The entropy and purity evaluate the stability of the cluster.The higher the purity, the more stable the cluster is and the smaller the entropy, the better the clustering performance.The Dunn index is the ratio of the smallest inter-cluster distance and the largest intra-cluster distance.A higher Dunn Index will indicate compact, well-separated clusters, while a lower index will indicate less compact or less well-separated clusters.We used the R package COINr to assign weights to each metric and compute a composite score representing the overall "cluster quality".The composite score was calculated by assigning a directional and weighted scoring mechanism.
Ranks were assigned to each evaluated combination, with the highest composite score or cluster quality being assigned a rank of 1, 2, etc.For our dataset, the highest cluster quality, i.e.
top ranking method, was the combination of PAM clustering algorithm with Canberra distance measure and a value of 12.

Clustering-based differential expression or pathway enrichment
We identified differentially expressed genes per cluster of interest and conducted preranked pathway enrichment using limma 47 , fgsea 48 , and GSVA 49 on those genes.We interrogated KEGG spliceosome and HALLMARK cancer pathways.We visualized these clusters using the R package pheatmap 50 labeling rows with histology and calculated cluster information.

Differential expression and visualization
Differential expression was performed based on a model using the negative binomial distribution, a method employed by the R package DeSeq2 51 .Those differential genes that had a p-value < 0.05 were deemed as significantly up or down-regulated.Volcano plots were generated by the EnhancedVolcano R package.Bar plots were generated using the R package ggplot2 52 .Note: differential expression analyses were limited to stranded-only RNA-seq samples in order to limit batch effects.

Identification of recurrent functional differential splicing variants in pediatric HGGs
To identify differential or aberrant alternative splicing events, we assessed the percent spliced in (PSI) value of each splice event relative to the median PSI value of splice event across all samples.Splicing events with a ΔPSI exceeding |2| z-scores from the median PSI value were classified as differential or aberrant.For these events, we computed average ΔPSIs and generated bed files for each mis-spliced exon event.We then obtained bed files of known functional annotations as defined by Uniprot release 2024_03 24 from UCSC Genome Browser web server.We ran bedtools v2.30 53 to find the overlap between mis-spliced exons and functional features using the command `bedtools intersect -wo -a`.We then plotted summary data by functional category (disulfide bonding sites, localization signals, amino acid modifications, and other).

CLK1 exon 4 related visualizations and correlations
To visualize the CLK1 exon 4 splice event, we utilized the R package ggsashimi 54 .We correlated CLK1 exon 4 PSI values with CLK1-201 or CLK1 TPM.We computed Pearson correlation coefficients and p-values of this plot using the R package ggpubr 55 .High CLK1 exon 4 inclusion tumors were defined as those with PSI values above the 75th percentile, while low SBI samples were those with PSI values below the 25th percentile comparing across all samples.

Protein visualizations
Protein visualizations were obtained from the PhosphoSitePlus web portal 56 , emphasizing protein domains, residue numbers, and sites of phosphorylation binding.

Upset R and Volcano plots
To visualize the intersections of multiple sets, we employed the UpSetR 57 plot in R. The input data consisted of differential and recurrent splicing events, if it was > 2 z-scores from the meanPSI and 2% of the histology-specific cohort.Volcano plots were generated by the EnhancedVolcano R package.

Splicing burden index and tumor mutation burden correlations
We identified samples with available data for both SBI (RNA-Seq) and WGS or WXS tumor mutation burden (TMB) from OpenPedCan 16 .Using the R package ggscatter, we performed a Pearson correlation analysis to examine the relationship between SBI and TMB.To ensure robustness, we repeated this analysis after excluding hyper-mutated samples (defined as those with TMB ≥ 10).Subsequently, we compared the distribution of TMB between high SBI and low SBI tumor samples using the Wilcoxon rank-sum test.High SBI samples were defined as those with SBI values above the 75th percentile, while low SBI samples were those with SBI values below the 25th percentile.The analyses were conducted across all samples and further stratified according to `plot_group`, as specified in the histologies clinical file.

Pathway over-representation analysis (ORA) and gene set variation analysis (GSVA)
We conducted over-representation analysis (ORA) using the R package clusterProfiler 58 and pathway data from the msigdbr package 59 , including "CP:KEGG", "CP:BIOCARTA", "CP:HALLMARK", and "TFT:GTRD."After inputting the genes of interest (eg.differentially spliced), we applied a p-value cutoff of 0.05 and used the Benjamini-Hochberg (BH) method for p-value adjustment.For visualization of the over-represented pathways, we employed the `enrichplot::dotplot()` function, displaying the gene ratio and the count of genes in each pathway.
To perform Gene set variation analysis (GSVA) we utilized the R packages `GSVA` and `msidbr`.Expression data for our samples, sourced from OpenPedCan v13 16 , were used to compute gene-set enrichment scores.Genes with zero variance were excluded from the analysis.We then assessed enrichment in Hallmark, KEGG, and custom pathways from Knijnenburg et al 60 .Gaussian-distributed scores were calculated using gsvaParam function in R.
The results were visualized using heatmaps of GSVA scores, generated with the R packages ComplexHeatmap and circlize.

Oxford Nanopore Technologies (ONT) Targeted Long-Read RNA-Sequencing
We designed primers to bind to exons present in all isoforms of CLK1 to ensure full coverage of all alternative splicing events.5 ng of cDNA were amplified with LongAmp Taq 2X Master Mix (M0287S, New England Biolabs) for 25 cycles.The resulting amplicons were subjected to amplicon-seq (SQKNBD112.24,ONT) library preparation, loaded into a Spot-ON flow cell R9 Version (FLO-MIN112, ONT), and sequenced in a MinION Mk1C device (ONT) until at least 1,000 reads per sample were obtained.Results were aligned using Minimap2 version 2.24-r1122 and visualized in IGV version 2.12.3.

DepMap and CRISPR dependency analyses
Datasets comprising gene transcript expression, cell line information, and CRISPR dependency scores were downloaded from DepMap (version 24Q2).The expression of CLK1 ENST00000321356 (exon 4 containing transcript) was categorized into high and low TPM expression, defined by values above the 75th quantile and below the 25th quantile, respectively.
CRISPR dependency scores were plotted on the y-axis, and Wilcoxon tests were conducted to compare high versus low TPM expression groups.These were stratified for each cell line type.
Additionally, CRISPR dependency scores for all CNS/brain cell lines were plotted, with KNS-42 highlighted in red.For the Childhood Cancer Model Atlas CRISPR dependency analyses, we acquired data from the Childhood Cancer Model Atlas 33 .We plotted CRISPR dependency scores (z) on the y-axis for each gene in CBTN pediatric HGG cell lines, either as median scores or stratified by individual patients with genes of interest highlighted.

Proteogenomic analysis
Pediatric proteomics, phosphoproteomics, and RNA data were obtained from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) via the ProTrack: Pediatric Brain Tumor opensource web portal.Data and z-scores were computed using the methods described by Petralia et al. 22 .Correlation plots of mRNA expression and proteomics were generated using these computed z-scores.Additional processed proteogenomic data utilized for correlation analyses were obtained from OpenPedCan v15 release 16 .

Morpholino Treatments
A Vivo-Morpholino ACTCTTCTGGAAACGTCAAGTGGGC (Gene Tools, LLC) targeting the intron 3-exon 4 splice junction was used to skip exon 4 in CLK1.Cells were treated with 1, 5, and 10 μM concentrations of CLK1 morpholino and 10 μM of Control morpholino.48 hours post-treatment, cells were harvested for PCR and immunoblots.

RNA Extraction and Quantitative Real-time PCR (qRT-PCR)
Total RNA was isolated and treated with DNAse using the Maxwell RSC simplyRNA Cells kit (Promega, AS1390) with the Maxwell RSC48 Instrument (Promega) per the manufacturer's instructions.Next, 2 μg of RNA were reverse-transcribed using SuperScript IV (Invitrogen, 18090010).Primers used for CLK1 mRNA transcript quantification are listed in TableS5H.qRT-PCR was performed using PowerSYBR Green PCR Master Mix (Invitrogen, 4367659) on an Applied Biosystems Viia7 machine.The amplification was performed using the following settings: denaturation at 95°C for 10 min, followed by 40 cycles of denaturation at 95°C for 15 s and annealing at 60°C for 1 min.The comparative cycle threshold (CT) method was applied to quantify the expression levels of CLK1.The fold change of gene expression was calculated by the equation 2ΔΔCT, with HPRT (Thermo Fisher, 4453320, assay ID: Hs02800695_m1) used as the housekeeping gene.

Protein Extraction
Cultured

Detection of Proteins Using Immunoblot Analysis
70 μg of total protein were mixed with 5X SDS loading dye (Biorad, 161-0374) and resolved on 10% SDS polyacrylamide gel.The protein was transferred onto a PVDF membrane (Immobilin-P, Millipore, IPVH00010) and probed with α-CLK1 mouse monoclonal primary antibody (Santa Cruz, sc-515897) and HRP conjugated secondary antibody (Cell Signaling Technology, 7076S).Bands were detected using enhanced chemiluminescence (Millipore, WBKLS0500) and captured by a Chemiluminescence imager (GE Healthcare).βactin was used as the loading control and probed with α-β-actin rabbit monoclonal antibody (Cell Signaling Technology, 12262S).

Cell Viability Assay
Cell viability was measured using the CellTitre-Glo (CTG) luminescent cell viability assay (Promega, G7570).Cells were seeded in white 96-well flat-bottom plates at a density of 24,000 cells per well and treated the following day with either 7.5 μM control or CLK1 exon 4 targeted morpholino.Luminescence was measured using a Biotek Synergy 2 plate reader at 24, 48, 72, and 96 hours.
For growth kinetics, 10,000 (3 day assay) or 6,000 (6 day assay) cells were plated per well into a 96-well plate (Greiner Bio-One, 655098) in a 200 uL total volume per well.Plates were placed into an Incucyte SX5 device and scanned every 2 hours for several days to measure growth via a mask designed uniquely for this cell type.At the end point of the assay, cell viability was analyzed with CellTiter Glo 2.0 reagent (Promega, G9242) by replacing half the media with reagent and reading on a Promega GloMax device.
Cirtuvivint (MedChem Express, HY-137435) was resuspended in 100% DMSO (Sigma, D2650-5X5ML) to 1 mM and stored in aliquots at -80 C. Dosing was optimized via serial dilution at a range of 20 uM to 0.02 uM against a vehicle control equivalent to the highest dosing of drug.Cells were plated and at 24 hours, 100 uL of media were removed from each well and replaced with drug media for a final dose range of 0.01, 0.05, 0.5, 0.1, 1, 5, and 10 uM.Cells were untouched for 3 days total while growth was monitored via Incucyte.

Figure 1 :
Figure 1: Pediatric brain tumors display heterogeneous global patterns of aberrant

Figure 2 :
Figure 2: Splicing drives novel biological clusters and splicing burden differentiates key

Figure 3 :
Figure 3: Recurrent splicing aberrations alter known proteomic functional sites in

Figure S2 .
Figure S2.(A) Stacked barplot showing tumor histology membership in each cluster stratified

Figure S3 .
Figure S3.(A)Over-representation analysis of mis-spliced kinase genes that result in gain/loss

Figure
Figure S4.(A) Boxplot of dependency scores stratified by high vs low CLK1 exon 4 containing

Figure
Figure S5.(A) Venn diagram showing the overlap of the total number of DS and DE genes.(B)

( 1 )
Exon inclusion related differential splicing events.(2) Exon skipping related differential splicing events.SpliceID includes gene name, misspliced exon start and end coordinates, upstream and downstream exon start and end coordinates.(3) Optimal clustering metrics and scoring method information.(4) Cluster membership for each sample.