Heterogeneity of the cancer cell line metabolic landscape

Abstract The unravelling of the complexity of cellular metabolism is in its infancy. Cancer‐associated genetic alterations may result in changes to cellular metabolism that aid in understanding phenotypic changes, reveal detectable metabolic signatures, or elucidate vulnerabilities to particular drugs. To understand cancer‐associated metabolic transformation, we performed untargeted metabolite analysis of 173 different cancer cell lines from 11 different tissues under constant conditions for 1,099 different species using mass spectrometry (MS). We correlate known cancer‐associated mutations and gene expression programs with metabolic signatures, generating novel associations of known metabolic pathways with known cancer drivers. We show that metabolic activity correlates with drug sensitivity and use metabolic activity to predict drug response and synergy. Finally, we study the metabolic heterogeneity of cancer mutations across tissues, and find that genes exhibit a range of context specific, and more general metabolic control.


Introduction
Metabolic pathways within a cell are responsible for an extremely complex and regulated set of reactions responsible for generation of energy and production of macromolecules (de Berardinis & Chandel, 2016;Pavlova & Thompson, 2016). Metabolic alterations are common to all cancers, often caused by genetic alterations to transcription factors such as HIF and NRF2 that regulate the expression of enzymes (Ward & Thompson, 2012;Nagarajan et al, 2016). Perturbation of the metabolic network can result in increased proliferative capacity, resistance to hypoxia (al Tameemi et al, 2019), and can promote metastatic transformation of the disease (Bergers & Fendt, 2021). Additionally, there is a growing number of cancers thought to be driven by the mutational inactivation or alteration of a metabolic enzyme such as succinate dehydrogenase (SDH; Bardella et al, 2011;Thompson, 2009), fumarate hydratase (FH; Tomlinson et al, 2002) or isocitrate dehydrogenase (IDH; Waitkus et al, 2018), and there is an emerging and growing understanding of the impact of metabolic rewiring on the resistant and sensitivity of a cancer to common drugs (Zaal & Berkers, 2018;Desbats et al, 2020). The heterogeneity of metabolic alterations in cancer is poorly understood, with studies often limited to small numbers of samples, to a limited number of metabolites, or without well-controlled media conditions (Wright Muelas et al, 2018).
Mass spectrometry allows the unbiased detection and profiling of metabolites in a studied sample, resulting in an understanding of the steady-state levels of various untargeted metabolites in a highly replicable and unbiased manner. We collected unbiased mass spectrometry (MS) steady-state data for metabolites in a large collection of human cancer cell lines.
High-throughput metabolomics data of cancer have been generated before Ortmayr et al, 2019), but these have been limited in their assigned metabolites and scope, and in the targets of their analysis. Here, we profile the steady-state metabolite levels of 1,234 samples from 173 cell lines in 11 different tissues. We measure the levels of 1,099 molecules, including lipids, from cell lines cultured under well-controlled conditions to study the associations of metabolic pathways with a variety of clinically important features such as oncogenic mutations. We integrate our data with publicly available genomic , expression and drug sensitivity data (Yang et al, 2013;Iorio et al, 2016) to provide a highly comprehensive metabolic understanding of cancer. We correlate drug sensitivity with metabolic pathways, showing that the activity of specific metabolic pathways can be indicative of response to commonly indicated anticancer therapeutics.
Finally, we define metabolic signatures associated with changes to commonly occurring cancer genes and study the tissue specificity of metabolites in these signatures, finding that cancer driver mutations exhibit a range of tissue specificities. Furthermore, we show that mutations in TP53 generate distinct changes to metabolic pathways depending on mutation class, highlighting the complexity of metabolic regulation in cancer, and revealing metabolite groups that show promise as potential biomarkers, or indicators of drug sensitivity in particular cancers.

Metabolic profiling of cancer cell lines
To study the levels of metabolites in a range of cancers, we cultured 173 different cell lines from 11 different tissues under well-controlled media conditions, and performed untargeted mass spectrometry (Appendix Fig S1A). We performed three biological and two technical replicates of each cell line (Dataset EV1). To quality control samples, we calculated the Euclidean distance between all pairs of samples, and used this to calculate the receiver operator characteristic (ROC) as a metric of reproducibility (Appendix Fig S1B). We corrected for sample mass (Appendix Fig S1C) and sample confluency (Appendix Fig S1D), removed injections with low mass and removed metabolites significantly associated with batch and plate effects (Appendix Fig S1E; for full details see Materials and Methods). Metabolites were assigned to peaks using HMDB v3.019, considering [M-H+], resulting in 3,474 potential metabolites for the entire data (Dataset EV2). Our resultant dataset contains 1,099 peaks for 1,234 samples, consisting of 173 unique cell lines from 11 tissues (Dataset EV3). Area under curve (AUC) for the resultant dataset is 0.9975 for technical replicates and 0.9206 for biological replicates (Appendix Fig S1F). We compared the distributions of our dataset to those published previously Ortmayr et al, 2019;Appendix Fig S2), and find that our data overall match previous reported data in terms of variance from the mean, standard deviation of each metabolite and unimodality. We additionally performed a statistical comparison of our data against Li et al, and find that for each metabolite tested our data correlates significantly more than the null distribution (Appendix Fig S2).
Having confirmed our data are similar to previously reported metabolomics data, we next z-scored the data and performed principal component analysis to check for correlation of principal components with experimental conditions, such as cell volume as has been observed previously (Ortmayr et al, 2019). We find that there are no large biases in the principal components (Appendix Fig S3A), and that the first two principal components do not correlate with small numbers of metabolites that could associate with non-tissue of origin phenomena (Appendix Fig S3B), indicated by the lack of extreme outlier metabolites in the first two principal components. Hierarchical clustering of the data reveals a heterogeneity that groups biological replicates and cancers of similar origin together, but does not separate the data purely by tissue, we find no statistical clustering when calculating the significance of tissue-level order (P = 0.43, See Materials and Methods; Appendix Fig S3C).
To further verify the reproducibility and visualize the structure of the data, we performed clustering analysis. We clustered each sample using T-map (Probst & Reymond, 2020), which constructs a hierarchical network diagram of samples connected by similarity (Fig 1A and B). Branches in the network generally represent cancer types, with biological replicates clustering together. Whilst overall the clustering does not separate the cell lines by tissue type, we do find specific instances where cell lines of the same tissue cluster together, similar to previous reports . Haematopoietic cell lines cluster into one part of the plot (Inset, bottom left), with AML, infant and adult TLL and CML all being grouped together. Furthermore, there is significant heterogeneity in cancers from the same tissue, for example, we find that lung cancers spread throughout the network, with the same cancer subtype not necessarily clustering together. One lung cancer branch (Inset top left) contains five subtypes of lung cancer (minimally invasive adenocarcinoma, adenocarcinoma, carcinoid, adenosquamous and squamous cell carcinoma), which all contain a similar metabolic profile, and predominantly contain non-silent mutations to NRF2 pathway genes (NFKB1, NFKB2, RELA, RELB, IKBKB, IKBKE and IKBKG; Appendix Fig S4).
To assess pathway activity in the dataset, we calculated the variance of all SMPDB core metabolic pathways (Frolkis et al, 2009;Jewison et al, 2014) across the dataset ( Fig 1C). We classify pathways as increased or decreased if the average rank of pathway members within a disease subtype changes significantly from the bulk rank. Pathways with the most variance across the dataset are generally those with a known association with cancer, including the Warburg effect, purine metabolism and glutamate metabolism. Hierarchical clustering of the samples separates them into two broad partitions, with a set of highly metabolically active cells characterized by increased purine metabolism and citric acid cycle activity, and a different set of cells generally less metabolically active, but with an increased fatty acid biosynthesis phenotype. Cancers with the largest number of increased pathways in comparison with the average are cancers of the colon and lung, possibly indicative of epithelial cancers being more proliferative.
Finally, we assessed correlations between metabolic pathways and the reported doubling times of cell cultures assessed ( Fig 1D). We took cells with a reported doubling time in the NCI-DTP (Monga & Sausville, 2002) database, and generated a regression model to correlate the rank change of each pathway in each cell type with doubling time. SMPDB pathways associated with a shorter doubling time include oxidation of fatty acids and glutamate metabolism, processes known to have a correlation with cancer growth.

Metabolic associations with cancer-driving genetic alterations
To study the metabolic effects of common oncogenic genetic alterations, we looked for associations between metabolites and known cancer-associated mutations. To validate the dataset, we looked to mutations in known cancer-driving metabolic enzymes IDH and SDH. IDH enzymes (Appendix Fig S5A) are responsible for the conversion of isocitrate to alpha-ketoglutarate, and their mutation in cancer is associated with an oncogenic build-up of 2hydroxyglutarate due to a change in catalytic activity (Losman & Kaelin, 2013). The most commonly observed oncogenic mutation in ▸ Figure 1. Metabolic profiling of cancer cell lines.
A T-map representation of metabolic profiling data for 1,298 samples of cell lines from 11 different tissues showing clustering by cell types. B Proportion of tissues in the dataset. C Pathway variability for most significantly different SMPDB core metabolic pathways across major tissue types. D Correlation of SMPDB core metabolic pathways with reported cell doubling time. IDH1 is at the hotspot R132, which directly changes the interaction of the binding site with Isocitrate. Whilst none of the cell lines studied contain the R132 mutation, we find two other mutations to IDH1 across our dataset, affecting G97 and S261. G97D is associated with a significant accumulation of 2-hydroxyglutarate compared to the bulk data (Fig 2A), similar to that of R132 mutations observed in Li et al (2019). Studying the IDH1 structure (PDB ID 5YFN), we find that G97 is also involved in a direct interaction with isocitrate, and therefore will also likely damage enzymatic activity ( Fig 2B); additionally, G97 has been previously implicated in human cancers and is known to accumulate 2-hydroxyglutarate (Losman & Kaelin, 2013). S261 conversely is on a distal region of protein not at all adjacent to the binding site, supporting our data that shows it is not associated with a change in catalytic activity. Additionally, we find mutations in the SDH enzyme family within our dataset (Appendix Fig S5B), particularly SDH2: R33H and R18G. Succinate dehydrogenase is involved in conversion of succinate to fumarate, and we find that R33H is associated with a shift in the succinate:fumarate ratio expected in cancer (Bardella et al, 2011), where R18G is not (Fig 2C), these mutations have not been characterized, and therefore we predict that R33H impairs catalytic activity, whilst R18G does not. These results demonstrate that known relationships between mutations in metabolic enzymes and their metabolic targets are captured in our analysis, and that we can predict the impact of not previously studied mutations.
To expand our analysis in an unbiased way, we controlled for tissue label to reduce lineage dependent effects and built regression models for every combination of all 1,099 metabolites and 723 cancer genes from the COSMIC cancer census (Sondka et al, 2018;Dataset EV4). Mutations were associated with each metabolite to find statistically significant correlations independent of tissue of origin. We find that groups of genetic mutations can be associated with increased or decreased levels of groups of individual metabolites ( Fig 2D). Clustering reveals groups of similarly associated genes and metabolites, for example, we find a major group of increased metabolites involved in starch and sucrose metabolism, pentose and glucoronate interconversions and galactose metabolism associated with genes enriched for proliferation, pathways in cancer and PI3K signalling ( Fig 2E).
Genes associated with the largest numbers of metabolites include KRAS, which has over 30 metabolites that are highly reduced when it is mutated, and KCNJ5 and CCND3, which are correlated with the increase of over 60 metabolites each when mutated (Appendix Fig S5C). Finally, some genes show a huge correlation with a small number of metabolites, for example, both NUTM2B and H3F3A show extremely large (> 50) T-statistics for < 10 metabolites (Appendix Fig S5D).
We confirm that the data accurately reproduce known relationships between mutations and pathways by studying the enrichment of metabolites in lung cell lines mutated for KEAP1, NRF2 and STK11 (Appendix Fig S6), three genes highly involved in disease progression. KEAP1 and NRF2 are known to alter glutathione metabolism (Panieri et al, 2020), and STK11 drives flux through glycolysis and the TCA cycle (Faubert et al, 2014). We confirm that KEAP1 and NRF2 mutation associated metabolites are enriched for glutathione metabolism, and STK11 mutation associated metabolites are enriched for the TCA cycle, as expecteddemonstrating the ability of the data to accurately reproduce known pathway associations.
We next studied mutations to known cancer genes such as KRAS ( Fig 2F and Appendix Fig S7A), and reveal concerted patterns of metabolic changes associated with mutation. Metabolic pathways associated with KRAS mutations include aminoacyl-tRNA biosynthesis, amino acid metabolism (Pupo et al, 2019) and notably a pathway suspected to be essential for cancer survival: glutamine and glutamate metabolism (Cluntun et al, 2017). We further generated the correlations between different KRAS mutations compared to KRAS WT cell lines (Dataset EV5) and find that glutamate metabolism is primarily enriched in G12A, G12C and G12D mutants (Appendix Fig S8). We additionally generate metabolic pathway associations for common cancer drivers TP53, PIK3CA and APC (Appendix Fig S7B). Finally, ranking cancer driver mutations by their correlation with glutamine ( Fig 2G) reveals that KRAS and ERBB4 have the strongest negative correlation (indicating that mutation is associated with a reduction of glutamine), and PRCC, JUN and PIK3CA mutations rank highly with an accumulation of glutamine.
Overall we confirm that mutations in known cancer drivers accurately reproduce expected associations with both individual metabolites such as IDH1 mutations and 2-hydroxyglutarate, and pathways such as KEAP1 mutations and glutathione metabolism. We further predict metabolic associations for mutations in all presently known cancer drivers.

Transcriptional signatures associate with metabolic pathways
To study the activity of common transcriptional programmes in cancer and their correlation with metabolite levels, we analysed the RNAseq data available for 152 of the cell lines studied in the Cancer Cell Line Encyclopedia (CCLE). For each cell line that had available gene expression data, we calculated the progeny pathway scores (Schubert et al, 2018), and correlated activation score with the levels of metabolites in every SMPDB core metabolic pathway ( Fig 3A and Dataset EV6). We find a number of significant G T-statistic ranking for glutamine, highlighting genes whose mutations associate highly with measured levels of glutamine. A Heatmap of P-value of association between SMPDB core metabolic pathways and progeny defined transcriptional programmes (P represents corrected iterative hypergeometric test P-value). B log10 (P-value)*correlation direction against rank order for all SMPDB core metabolic pathways and their association with NRF2, showing glutathione metabolism as a clear outlier. C log10 glutamate levels against NRF2 relative activity. R represents Pearson correlation coefficient. D Network of correlated metabolites and genes across all RNA expression and metabolic data. Metabolites and genes are linked if their expression correlates with a Pearson correlation coefficient > 0.8. Plotted are all strongly connected components with 10 or more members. E Enrichment of metabolites (left -Metaboanalyst) and genes (right -Gprofiler) in cluster 1. (hypergeometric test FDR corrected P-value < 0.001) associations between SMPDB pathways and transcriptional programmes. Notably TP53 activity significantly correlates with purine metabolism and the Warburg Effect, both pathways previously associated with all human cancers (Gaude & Frezza, 2016), as expected from a universal cancer driver gene. Transcriptional and metabolic pathways with the most significant correlations were fatty acid biosynthesis correlating with PI3K, TGFB, hypoxia and androgen expression programmes.
To expand this analysis, we compared the calculated activity of 106 reliable transcriptional regulators (Garcia-Alonso et al, 2019) with the activity of all SMPDB core metabolic pathways (Appendix Fig S9 and Dataset EV7). We find significant correlations between the activity of numerous transcription factors and metabolic pathways associated with cancer, correlating pathways with their most significant transcriptional regulators (Appendix Fig S10A-D) we find that RXRA most significantly associates with glycolysis (Desvergne, 2007), as well as MYB and FOXA1. NFIC most significantly correlates with both the TCA cycle and Warburg effect, REST1 with the pentose phosphate pathway. To expand this analysis to a specific example, we looked at the activity of antioxidant responsive gene NRF2. NRF2 upregulation in cancer increases glutathione synthesis, and increases consumption of intracellular glutamate (Okazaki et al, 2020;Panieri et al, 2020). The most significantly associated metabolic pathway with NRF2 is glutathione metabolism (Fig 3B), as expected, and we additionally detect a significant and negative (P-value < 0.001, R = À0.34) correlation between NRF2 activity and glutamate availability (Fig 3C), highlighting that our data capture the known relationships between specific transcriptional regulators and metabolic pathways, and that the RNA expression data correlate with the mutational analysis performed previously. Analysis of HIF1A also reveals a significant association with fatty acid biosynthesis and steroidogenesis (Appendix Fig S10E), and corresponding reduction of coenzyme A (Appendix Fig S10F) expected from HIF1A control of fatty acid metabolism (Huang et al, 2014;Seo et al, 2020).
To look for sets of correlated metabolites and genes, we performed unbiased network analysis of the combined gene expression and metabolite data ( Fig 3D). For all samples in which we had RNA expression data, we generated a cross-correlation value for every pair of genes, metabolites and gene-metabolites. Linking pairs with a Pearson correlation coefficient > 0.8, we generate a series of strongly connected components, representing coordinated sets of genes and metabolites across the dataset. Many sets include both genes and metabolites, and enrich for cancer-associated metabolic and transcriptional programmes. The largest cluster, designated cluster 1 enriches for metabolites associated with purine metabolism, a hallmark of cancer, and genes associated with neuronal development and synaptic activation, possibly highlighting the proliferative and metabolic changes occurring uniquely to neuronal cancers (Fig 3E). We further define three more clusters: a cluster associated with the pentose phosphate pathway and transcriptional pathways linked to metastasis and epithelial to mesenchymal transformation (EMT; cluster 2 - Fig 3F), glutathione metabolism and pathways in the haematopoietic lineage (cluster 3 -Appendix Fig S10G), and immune cell-specific genes and glycerophospholipid metabolism (cluster 4 -Appendix Fig S10H).
Together these findings highlight the functional importance and associations of both cancer-associated transcription programmes, and cancer-associated mutational transformation with metabolic pathways. The coupling of these levels of data can enrich our understanding of the coordinated changes occurring in cancer.

Metabolic pathway activity correlates with drug resistance and sensitivity
To study the correlations between metabolite levels and drug sensitivity, we obtained drug sensitivity data for all of cell lines studied from the cancer × gene project (Yang et al, 2013). We correlate the IC50 for every drug in each cell line with the activity of metabolic pathways (Dataset EV8), assuming that for many drugs, metabolic activity may influence how sensitive a cell is. We group metabolic pathways and drug mechanisms of action to look for concerted correlations between drug susceptibility and metabolism. For drug sensitivity (Fig 4A), we find that pathways increased in association with increased sensitivity to a drug are generally split between metabolism of the three major molecules; amino acids, fatty acids and sugars. Interestingly, different mechanisms of action have different metabolic pathways associated with sensitivity. PI3K pathway inhibitors are associated with an increase in amino acid metabolism pathways, whereas apoptosis regulation associated drug sensitivity correlates most with an increase in carbohydrate metabolism. Studying drug insensitivity (Appendix Fig S11A), we find that an increase in carbohydrate metabolism pathways correlates with insensitivity to cell cycle inhibiting drugs.
Focussing on specific drugs, we looked for pathways that correlate with resistance or sensitivity. For rapamycin (Fig 4B), the highest correlate with insensitivity is an increase of glutamate metabolisma known resistance mechanism (Csibi et al, 2013;Tanaka et al, 2015), as well as increased activity of the citric acid cycle, and an increase in the Warburg effect. Interestingly, however, we find that an increase in fatty acid biosynthesis is correlated sensitivity to rapamycin. Similarly, for cisplatin (Appendix Fig S11B), we find that an increase in phospholipid biosynthesis correlates with insensitivity as previously explored (Lee et al, 2018), whilst steroid biosynthesis and glutathione metabolism correlate with sensitivity ▸ Figure 4. Metabolic activity correlates with drug sensitivity and resistance.
A Sankey plot of sensitivity associations (Pearson-rho P-value < 0.05) between metabolic pathway classes (left) and drug classes (right). B Àlog10(P-value)*correlation direction (iterative hypergeometric test) for associations between rapamycin IC50 and SMPDB core metabolic pathways, highlighting key pathways associated with resistance (positive) and sensitivity (negative) to rapamycin. C Spectrum of relative resistance to glycolysis and fatty acid biosynthesis for all drugs studied, highlighting drugs with most extreme anticorrelation between the two pathways. D Correlations between metabolic resistance associations for the most anticorrelated drugs. Metabolic pathway correlations against drug sensitivities are compared for all drugs. Red squares represent known drug synergies from DrugCombDB. Drug synergies tend to coincide with anticorrelated metabolic sensitivities.  Fig S11C), we find a sensitivity associated with increased glycine and serine metabolism.
Observing that glycolytic activity correlates with known insensitivity to rapamycin, we surmised that drugs with the opposite trenda sensitivity correlated with glycolytic activity, may represent a therapeutic pairing that guarantees sensitivity to one drug. We also observe that fatty acid biosynthesis sensitivity is generally anticorrelated to glycolysis sensitivity ( Fig 4C). Drugs fall into a spectrum whereby increased sensitivity correlated with one pathway generally also correlates with insensitivity to increased activity of the other pathway. The two extremes of this spectrum may represent drug pairings for which there is always a sensitivity (i.e. a cell cannot be insensitive to both drugs at once), and two combinations that are anticorrelated support this view, both salubrinal and bortezomib are known to be synergistic with rapamycin (O'Sullivan et al, 2006;Wang et al, 2012;Zhao et al, 2016). Going further, we calculated the correlations between all pathway sensitivities for all drugs (Dataset EV9), surmising that drugs whose IC50s correlate with opposite metabolic pathways may represent synergistic or "always-effective" combinations. Correlating all drug sensitivity pairs (Fig 4D) highlights that many drugs exhibit opposing metabolic sensitivities. Overlaying these correlations with known synergies from the DrugCombDB (Liu et al, 2020), we find that many known synergistic drug combinations are also anticorrelated in their metabolic pathway associations. Having shown that metabolic sensitivity can correlate with known drug synergies, we highlight a number of drug combinations for which our analysis suggests that a cell will always have some degree of sensitivity.
These data link metabolic activity explicitly with drug resistance. We validate that our data predict the resistance profiles of drugs compared to metabolic pathways, and suggest combinations of drugs that have opposite metabolic sensitivity profiles.

Metabolic shifts associated with cancer-causing genetic alterations exhibit a range of heterogeneities
Next, we looked to study the heterogeneity of metabolic changes associated with mutations to common cancer-causing genes. We surmised that mutations to the same gene will have different effects on metabolism dependent on the tissue context in which the mutation occurs. We first studied the mutational distribution in the cell lines studied (top 50 most mutated genes are shown in Fig 5A). Most of the common mutations are distributed across multiple tissues, and so show at least some level of penetrance in more than one of the 11 tissues studied. The top mutated genes are those common to cancers, including TP53, KRAS, MAP3K1 and PIK3CA.
To look at heterogeneity of the metabolic changes associated with each mutation, we first calculated tissue-specific associations between mutations in each of the top 50 mutated genes and every individual metabolite similar to the unbiased analysis in Fig 2 but building a regression model only on a single tissue. After generating the tissue-specific metabolic associations, we calculated the Spearman's rank correlation coefficient between the associations for pairs of the three largest tissue subsets in the datasetbladder, breast and lung cancers (Fig 5B). We find that the top 50 mutated genes display a spectrum of heterogeneity between tissues. Some genes such as KRAS, HRAS display a high correlation between disparate tissues, indicating a conserved set of metabolic alterations despite the different contexts; one the other hand, mutations to TP53 (Appendix Fig S12), CCNE1 and other common cancer drivers appear to have almost no correlation across tissues, and so likely alter metabolism in a distinct way, possibly more associated with mutation type than tissue context.
To further unpick the pathways involved in tissue specificity or agnosticism, we performed differential expression to find metabolites that are significantly different between mutant and WT KRAS ( Fig 5C) and TP53 (Fig 5D). Enrichment analysis highlights pathways that are conserved across all tissues, for KRAS mutations this includes glycerophospholipid metabolism, biosynthesis of unsaturated fatty acids and amino sugar and nucleotide sugar metabolism. For TP53, core enriched pathways in all tissues are sphingolipid metabolism, and the pentose phosphate pathway. We also highlight pathways specific to one type of tissuefor KRAS mutant lung cancers, we find that pentose and glucoronate interconversions and linolenic acid metabolism are significantly enriched, and for TP53 mutant bladder cancer, we find an enrichment for core metabolic pathways involved in the citric acid cycle and glutamine metabolism.
This highlights that the heterogeneity mutations to the same cancer driver genes can have across different tissues, with some genes exhibiting effects that are agnostic to tissue, whilst others have a more specific metabolic effect depending on the tissue they are present in.

TP53 metabolic signatures correlate with downstream genes and exhibit mutation-type differences
To further study the metabolic implication of TP53 mutations, we next studied the overlap between mutations to TP53, and to five downstream targets involved in different TP53-associated control: TSC1 and cell growth, XPC and DNA damage response, FAS and apoptosis, MDM2 and feedback and CDKN1A and cell cycle control (Feng et al, 2005;Fischer, 2017; Fig 6A). We performed enrichment analysis of metabolites differently measured between TP53 mutated and WT cells, then performed the same enrichment with metabolites only differently abundant due to both TP53 and each downstream gene in order to unpick whether TP53 pathways can be associated with a downstream target. Each downstream gene shares between 13% (XPC) and 29% (CDKN1A) of the total metabolites differently measured for TP53 mutations, and performing enrichment analysis highlights which TP53 pathway is most associated with each downstream gene (Fig 6B). We find that cell growth metabolic pathways, such as TCA cycle and pyruvate metabolism, are predominantly associated with the cell cycle protein CDKN1A, whereas glycerophosphate metabolism is mostly enriched with apoptotic regulator FAS. Overall, we show that metabolic pathways associated with TP53 mutation can be associated with known downstream effector genes, thus we can separate individual metabolic signatures for each of TP53s wide range of functions.
Having determined that metabolic changes associated with TP53 mutation do not appear to correlate strongly between tissues, we studied the impact of different types of mutations to TP53 on metabolite levels. Mutations in TP53 broadly fall into two categories, those directly impacting the DNA binding interface and altering the contact between the protein and DNA (DNA contact mutations), and those altering the wider structure of the protein itself and changing its stability and affinity for binding partners (conformation mutations; Freed-Pastor & Prives, 2012;Baugh et al, 2018). Previous work has highlighted that these two categories of mutations confer differing transcriptional programmes, cellular phenotypes and interestingly also drug sensitivities (Turrell et al, 2017;Monti et al, 2020), we surmised that there may also be metabolic changes associated specifically with each mutation, as well as a conserved set of metabolites that are changed in response to any TP53 mutation. We studied our data for cells with mutations in TP53 that are validated as DNA contact altering, or conformation altering ( Fig 6C).
We looked for sets of coordinated metabolites across three conditions (WT for TP53, DNA contact mutant and conformation mutations in TP53). We assigned KEGG terms, and expression changes between the WT and two mutant types were calculated (Fig 6D). We find sets of metabolites that are consistently altered in both cases, notably purine metabolism and phospholipid biosynthesisboth hallmarks of cancer metabolic change and cell growth, are significantly altered in both types of TP53 mutation compared to WT. Interestingly, we also find metabolic alterations specific to individual mutation types, notably sphingolipid metabolism and amino sugar metabolism appear selectively altered in conformation mutants, whilst pathways associated with amino acid metabolism and the citric acid cycle, cell growth pathways, are primarily altered in DNA contact mutations. This heterogeneity opens up the possibility that signalling differences recognized between different mutant types in TP53 may also result in characteristic metabolic shifts, opening up a further research avenue of drug sensitivity and prognostic value for metabolism in TP53 mutant cancer.

Discussion
There have been significant efforts to understand the interplay of metabolic rewiring in cancer, including work on large datasets that attempt to integrate metabolomics from multiple tissues Ortmayr et al, 2019). Metabolic alterations occur in all cancers, and widening our understanding of the interplay between changes to metabolite levels and genetic alterations that drive cancer will be key to a better understanding of the disease in the future. To further study the complex and poorly understood relationship between metabolic rewiring in diverse tissues, we performed an unbiased analysis of 1,099 metabolite levels in 173 different cancer cell lines distributed across 11 different tissues cultured in the same conditions. The depth of this study allows the unbiased correlation of metabolic pathways with other large datasets studying mutations, RNA expression and drug sensitivities to link specific metabolic pathways to known cancer properties. This study suggests that cancer-driving mutations have a range of metabolic heterogeneities across tissues that have implications for the detection and therapeutic targeting of cancer, in particular when studying the metabolic effects of TP53 mutation we highlight that consideration must be taken for both the tissue context, and the mutational type.
We correlate mutations to cancer-causing genes with individual metabolite levels, and show an association between common metabolic pathways and transcriptional programmes that are known to drive cancer; furthermore, we find that metabolite levels are indicative of the drug sensitivity of numerous cancer cell lines, opening up an opportunity for assessment of a cancer therapeutic response through analysis of tumour metabolites. Finally, we find that tissue context and mutation type can alter the metabolic signature associated with some genetic alterations, further expanding our understanding of the ever increasing heterogeneity of human cancers.
Due to the constant experimental conditions and large number of biological and technical replicates, coupled with the large number of metabolites studied, our study offers a significant and reliable resource for further unpicking the roles of metabolism in cancer. And that these data are broader and deeper than previously published work allows a greater understanding of the heterogeneity of cancer cell metabolism. Previous data have either been limited by collection of a small number of cell lines, or through measurements including a smaller number of potential metabolites, and as such the ability of these studies to analyse metabolite pathways is limited either by the number of metabolites that can be assigned to a pathway, or the number of cell lines that can be compared. Here, we are able to analyse the activity of many metabolic pathways across a broad range of cancer cells, and can additionally correlate the activation states of these pathways with other features such as mutations and transcription factor activity.
Whist the data generated for this study is rich, it is collected at steady state, and so does not incorporate the dynamic changes of metabolism in response to external stimuli; additionally, cell culture is recognized to be substantially different to human cancers in vivo, and thus future work will need to address both the dynamic behaviour and limitations of cell culture-based study of human cancers. These data, however, offer a unique insight into the relationship between protein function and cellular metabolic state. Future work using CRISPRbased screens will allow the specific relationships between individual genes and metabolism to be compared directly to drug impact allowing a deeper understanding of both drug mechanisms and off target effects.

Processing and data normalization
We used primary data published simultaneously in Cherkaoui et al (2022). We used the area under curve of the receiver operator characteristic (ROC) based on the Euclidean distance between biological replicates and non-biological replicates as a metric for reproducibility (Appendix Fig S1B).
To normalize for injection sequence artefacts, we applied Lowess smoothing to each ion independently (Appendix Fig S1C), normalizing the sum of the ion intensities in each injection. We also removed injections with low ion intensities (defined as those with a log2 (sum of ion intensity) of < 25). This resulted in the total removal of 112 injections.
We also normalized for the confluences of the plate. We found an observable correlation between total ion intensity per sample and measured confluency prior to injection. We applied Lowess smoothing to correct for confluency correlation (Appendix Fig S1D).
To remove metabolites that had a significant correlation with the batch or plate in each experiment, we calculated the P-value using ANOVA for a plate and batch effect for each metabolite (Appendix Fig S1E). The best AUC improvement was found with the removal of 127 peaks. We further found a bias induced by the high number of repeats of MCF-7 cell lines (but not MDAMB231)identified by principal component analysis and TSNE (not shown), and as such, MCF-7 cells were also removed from the analysis.
The resultant final dataset contains 173 cell lines, with 1,234 total injections, and 1,099 total peaks. We find that the AUC values improve from 0.9928 technical reproducibility to 0.9975, and 0.7934 biological reproducibility to 0.9206 (Appendix Fig S1F). Data were then averaged (mean) over the biological repeats before analysis was performed.

Permutation statistics
For generating P-values of clustering or similarity between datasets, we generated null distributions to generate a P-value from. For clustering, we generated a score for how clustered the tissue labels in the heatmap were, based on how many same-tissues were adjacent to each other. For the dataset comparisons, we generated the Pearson correlation between the same metabolite and cell lines in the two datasets (this manuscript and Li et al).
We then permuted the cell lines in each case one million times, and used the scores generated from these random distributions to generate a null distribution. We then measured how many times a score equal to or greater than the observation occurred to generate a P-value.

Pathway activity analysis
Pathway activity was calculated using the rank change of metabolites in a pathway of interest. SMPDB (Jewison et al, 2014) pathways were downloaded, and metabolites for each pathway collected. Metabolites for all the samples were ordered by average expression, and the equivalent ordering was performed for each cell line. Pathway activity was calculated through comparing the rank of metabolites in a pathway between the cell line of interest and the bulk data. P-values were calculated through a hypergeometric test for rank change, pathways where 40% of all cell lines had a significant association (P < 0.05) were considered for plotting, and pathways where the total pathway rank sum change was greater than 350 or lower than À350 were coloured.

Pathway enrichment analysis
Enrichment analysis was performed using Metaboanalyst 4.0 (Chong et al, 2018). We used the pathway enrichment tool by inputting metabolite HMDB IDs, using the hypergeometric enrichment method against the KEGG database (Kanehisa et al, 2017). For RNA pathway enrichment, we used Gprofiler (Raudvere et al, 2019) through their python library API.

Metabolite/mutation correlation
We calculated the correlations between mutations and metabolites using ordinary least squares (OLS) regressions. We calculated the correlations for every metabolite and mutations in every gene in the COSMIC cancer gene census (Sondka et al, 2018). For each metabolite/mutation pairing, samples were partitioned into those wild type for the gene of interest, and those with a non-silent mutation in the gene (samples for which no mutation data were available were excluded), data were downloaded from the Cancer Cell Line Encyclopedia . A tissue label was included for all 11 tissues of origin, which was one-hot encoded and passed to the model. The metabolite expression was Z-scored and a regression model built for every metabolite/mutation pairing. We opted to use the T-score for further analysis, because they allow further discrimination between extreme values (whereas a P-value will stop at a value of 0). We used the statsmodels and sklearn libraries in python to calculate regressions (Seabold & Perktold, 2010;Pedregosa et al, 2011). We tested the performance of the method on known mutation/metabolite pairing IDH1 and 2-hydroxyglutarate.

RNA correlation
RNA expression data were downloaded from the Cancer Cell Line Encyclopedia . We calculated the PROGENY (Schubert et al, 2018) and DOROTHEA (Garcia-Alonso et al, 2018) pathway scores for each cell line using their respective R libraries. For DOROTHEA, we chose to only calculate the transcription factor activity for the two most reliable pathway categories (A and B). RNA pathways were correlated with each SMDB metabolic pathway by calculating the hypergeometric P-value for the ranks of each pathway member when ordered by their absolute correlation with the RNA signature score. P-values were corrected using the Benjamini and Hochberg FDR method.

Drug sensitivity correlation
Drug sensitivity data were downloaded from the GDSC (Yang et al, 2013). Correlations between drug sensitivity and pathway activity were calculated by generating the hypergeometric test Pvalue (multiple test corrected using the Benjamini and Hochberg method) for the ranks of pathway metabolites when ordered by their absolute correlation against the sensitivity score. Synergies were obtained from DrugCombDB (Liu et al, 2020). We defined a combination as synergistic if its normalized zero interaction potency (ZIP) score was greater than 0.

TP53 mutation analysis
TP53 mutational pathway trees were generated using the Metabodiff R library (Mock et al, 2018) and plotted with phytools (Revell, 2012). We used the workflow adapted from the Weighted Gene Co-Expression Analysis (WGCNA; Langfelder & Horvath, 2008) with a multiple test corrected P-value cut-off of 0.05.

Data availability
This manuscript uses a dataset derived from Cherkaoui et al (2022). We provide our data after the normalization described in the materials and methods. Data and the results of all analysis is available in the supporting information. All codes used in to perform analysis and generate figures are available here: https://github.com/ shorthouse-mrc/CellLine_metabolomics_analysis. We have developed a dashboard allowing the exploration of the data here: https:// cancer-metabolomics.azurewebsites.net, and code for the display of this dashboard is available here: https://github.com/shorthousemrc/CellLine_Metabolomics.
Expanded View for this article is available online.