Predicting T‐cell quality during manufacturing through an artificial intelligence‐based integrative multiomics analytical platform

Abstract Large‐scale, reproducible manufacturing of therapeutic cells with consistently high quality is vital for translation to clinically effective and widely accessible cell therapies. However, the biological and logistical complexity of manufacturing a living product, including challenges associated with their inherent variability and uncertainties of process parameters, currently make it difficult to achieve predictable cell‐product quality. Using a degradable microscaffold‐based T‐cell process, we developed an artificial intelligence (AI)‐driven experimental‐computational platform to identify a set of critical process parameters and critical quality attributes from heterogeneous, high‐dimensional, time‐dependent multiomics data, measurable during early stages of manufacturing and predictive of end‐of‐manufacturing product quality. Sequential, design‐of‐experiment‐based studies, coupled with an agnostic machine‐learning framework, were used to extract feature combinations from early in‐culture media assessment that were highly predictive of the end‐product CD4/CD8 ratio and total live CD4+ and CD8+ naïve and central memory T cells (CD63L+CCR7+). Our results demonstrate a broadly applicable platform tool to predict end‐product quality and composition from early time point in‐process measurements during therapeutic cell manufacturing.

autologous Chimeric Antigen Receptor (CAR) T-cell therapies (Yescarta™, Kymriah™, Tecartus™, and Breyanzi ® ) have received approval from the U.S. Food and Drug Administration to treat certain B-cell malignancies. Despite these successes, T-cell-based immunotherapies are constrained by time-intensive, high cost, complex manufacturing processes that are time-intensive, expensive, and difficult to scale 3,4 and lack methods and tools to predict the end-product quality during manufacturing. Quality assessment is performed only at the end of manufacturing which takes many days. Identification of early putative critical quality attributes (CQAs) and the associated critical process parameters (CPPs) that can be measured nondestructively during culture and can predict end-product attributes early in the manufacturing timeline could be transformative for the cell therapy field.
Translating laboratory-scale T-cell expansion experiments into a large-scale manufacturing process is hindered by the incomplete understanding of cell properties and how they are affected by process variables, lack of detailed characterization, and high variability of materials during manufacturing. 5 These challenges of manufacturing a "living product" are further magnified since current chemistry, manufacturing, and control, analytics, regulations, and product specifications are designed for conventional chemical and biopharmaceutical manufacturing systems. 6 This underscores the need to develop innovative tools, methods, and standards to ensure appropriate quality controls, and new strategies involving quality by design and good manufacturing practices for cell-based therapies. [7][8][9] The intricate manufacturing process for T cells and other cell therapies must be deeply assessed and appropriately controlled to ensure scalability, predictability, and a high-quality manufacturing process at the most reasonable cost. A key step for reaching this goal is to identify putative CQAs and CPPs early in the manufacturing process that can predict the quality of the manufactured cell-therapy product. We hypothesized that rigorous characterization of process parameters along with longitudinal measurements of cell-secreted cytokine, chemokine, and metabolites from the culture media early during manufacturing will allow us to develop an artificial intelligence (AI)based mathematical-computational framework for the identification of multivariate parameters that are predictive of the end-ofmanufacturing product phenotypes.
Characterization studies of approved autologous anti-CD19 CAR-T cell therapies have recently revealed initial sets of candidate quality attributes, that is, percent transduction, vector copy number, and interferon-γ production for axicabtagene ciloleucel (Yescarta™), 10 while CAR expression and release of interferon-γ are a few of those identified for tisagenlecleucel (Kymriah™). 11 Many of these attributes are calculated as endpoint responses and thus a deeper understanding of the cell growth process impacted by starting conditions and performance during their manufacturing is essential. Hence, CQAs that enable early monitoring through realtime process measurements such as multiomics cell characterization can overcome current challenges in assessing product consistency.
Yet, the computational complexity of dealing with the heterogeneity and multivariate nature of multiomics measurements to characterize T-cell quality, that is, high-definition phenotyping of naïve and memory subsets, remains a challenge.
Generally, T cells with a lower differentiation state such as naïve and stem cell or central memory cells have been shown to provide superior anti-tumor potency, presumably due to their higher potential to replicate, migrate, and engraft, leading to a long-term, durable response. [12][13][14][15] Likewise, CD4 T cells are similarly important to antitumor potency due to their cytokine release properties and ability to resist exhaustion. 16,17 Our group has developed a novel degradable microscaffold (DMS)-based method using porous microcarriers functionalized with anti-CD3 and anti-CD28 mAbs for use in T-cell expansion cultures. We showed that compared to commercially available microbeads (Miltenyi), DMSs generated a higher number of migratory naïve (T N ) and central memory (T CM ) (CCR7 + CD62L + ) T cells and CD4 + T cells across multiple donors. 18 We used this manufacturing process as an exemplar to develop an experimentalcomputational AI-based tool to predict product quality from early process measurements. This two-phase approach consists of (1) the optimization of process parameters through experimental designs, and (2) the extraction of early predictive signatures of T-cell quality by multiomics integration using regression models. This agnostic computational approach provides a platform to discover early predictive CQAs and CPPs to ensure consistent product quality that can be widely applicable for other cellular therapies.  Figure S1). The extraction of early predictive CPPs and CQAs for the expansion of T N + T CM cells during ex vivo culture was performed in two phases: (1) optimization of process parameters and (2) integration of multiomics for predictive modeling ( Figure 1). F I G U R E 1 Two-phase approach to extract early predictive critical process parameters (CPPs) and critical quality attributes (CQAs) for CD4 + / CD8 + T N + T CM cells. (a) Design-of-experiment (DOE) modeling and optimization of process parameters. (b) Experimental region studied and optimized for total live CD4 + T N + T CM cells. (c) Total live CD4 + T N + T CM cells across the overall study design (two experiments varying process parameters). (d) Integrative multiomics approach through (e) a machine learning consensus analysis to identify early predictive CPPs and CQAs putative candidates for both total live CD4 + and CD8 + T N + T CM cells T A B L E 1 LOO-R 2 prediction performance results for all machine learning (ML) models when evaluating process parameters, and features from cytokine and nuclear magnetic resonance (NMR) media analysis at day 6 or day 4 Notes: ML models' prediction performance is measured as the leave-one-out cross-validated R 2 (LOO-R 2 ) while SR prediction performance is measured as R 2 of the ensemble prediction where the ensemble is composed of diverse models with complexity constrained. Predictors evaluated: (PP) Process parameters, (N) NMR, (S) Cytokines measured at day 4 or 6. Maximum R 2 within each ML method are shown in bold.

| Optimization of T N + T CM cells as a function of process parameters
Using symbolic regression (Data Modeler software from Evolved Analytics LLC), we examined the interactive effects of the DMS parameters on yield to simultaneously predict and optimize both CD4 + and CD8 + T N + T CM . A model ensemble predicted 4.2 Â 10 6 CD4 + T N + T CM cells at an optimum setting of 30 U/μl IL2, 2500 carriers/μl, and 100% functionalized mAbs (Supporting Figure S2). This result was consistent with the observed maximum value of 4.0 Â 10 6 , highlighting that CD4 + T N + T CM yield was maximized at high levels of DMS parameters ( Figure 1b). In contrast, the predicted optimum yield for CD8 + T N + T CM was 1.9 Â 10 7 cells at a setting of 30 U/μl IL2, 600 carriers/μl, and 100% functionalized mAbs (data not shown).
Although this combination was not experimentally tested, the closest measured record (30 U/μl IL2, 500 carriers/μl, 100% functionalized mAbs) achieved the predicted maximum yield. Hence, the CD8 + T N + T CM yield was maximized at high IL2 concentration and functionalized mAbs percentage but low DMS concentration.
The DOE analysis highlighted the potential for further optimization of total live CD4 + T N + T CM cells, as well as the potential to optimize the CD4 + to CD8 + T N + T CM cells ratio, at DMS levels greater than those originally evaluated (DOE). Therefore, to test and validate, a second adaptive design of experiment (ADOE) was designed to maximize the total live CD4 + T N + T CM cells. We expanded the parameter range, assessing IL2 concentration >30 U/μl and DMS concentration  Table S1; Figure S2). Utilizing the ADOE data set, new response ensembles were generated enabling more robust prediction over the expanded parameter space ("IL2 and "DMS concentrations).

| Multiomic integrative analysis for early monitoring of T-cell manufacturing
Due to the heterogeneity of the multivariate data collected and knowing that no single model structure is perfect for all applications, we implemented an agnostic modeling approach to better understand these T N + T CM responses. To achieve this, a consensus analysis using SR models achieved the highest predictive performance (R 2 > 93%) when using multiomics predictors for all endpoint responses (Table 1). SR achieved R 2 > 98%, while GBM tree-based ensembles showed leave-one-out cross-validated R 2 (LOO-R 2 ) >95% for CD4 + and CD4 + /CD8 + T N + T CM responses. Similarly, LASSO, PLSR, and SVM methods showed consistent high LOO-R 2 , 92.9%, 99.7%, and 90.5%, respectively, to predict the CD4 + /CD8 + T N + T CM .
This integrative analysis of cytokine and NMR media analysis monitored at early stages of the T-cell process provided highly predictive feature combinations of end-product quality particularly for total live T N + T CM CD4 + cells and CD4 + /CD8 + ratio as shown in 2.4 | Single-omics media analysis for early prediction ML models using solely media cytokine profiles at day 6 reached similar or higher R 2 than those of the multiomics models (CD4 + T N + T CM : 71.4%-99.9%; CD4 + /CD8 + : 83.4%-99.7%). However, CD8 + T N + T CM still had variable LOO-R 2 , 7.8%-93%. Overall, higher cytokine media profiles showed higher CD4 + T N + T CM and consequently its ratio with CD8 + (Figure 4a). This behavior was evident, even beyond day 6, for TNFα, IL2R, IL17a, and IL4 which were frequently selected as predictive features across models (Figures 4b,c and S3g-i).
A more complex behavior was detected for CD8 + T N + T CM which cannot be explained by cytokine secretion alone (Figure 4d).
Models using only NMR media intensities on day 6 revealed an R 2 decrease of 8.8% and 11.1%, on average, compared with the with NMR media analysis on day 6, we obtain slightly better predictions with NMR media analysis on day 4 (Table 1). From these models, formate, lactate, DMS concentration were highly ranked to predict both, ratio CD4 + /CD8 + and CD4 + T N + T CM ( Figure S3a-f).
Some variable combinations also contained histidine, ethanol, dimethylamine, branch chain amino acids (BCAAs), glucose, and glutamine (Table S3). Lower intensity values for BCAAs, dimethylamine, glucose, and glutamine displayed higher CD4 + T N + T CM cells across the different media monitoring times ( Figure S5a). Inversely, higher intensities of formate and lactate showed higher CD4 + T N + T CM and its ratio with CD8 + consistently across time (Figure 5a,b).
The initial screening of a few samples from a different experimental batch shows much lower values of T N + T CM responses but maintains a similar NMR and cytokine media patterns as the DOE and ADOE experiments (lower value intensities/secretion, lower T N + T CM response) in terms of the total live T N + T CM cells for CD4 + and CD8 + . However, the decay in total live T N + T CM cells for CD8 + is much rapid than CD4 + which makes the ratio behave in a more complex behavior (Figures S7 and S8).

| DISCUSSION
CPP's understanding is critical to new product development and, especially in cell therapy development, it can have life-saving implications. The challenges for effective modeling grow with the increasing complexity of processes due to high dimensionality, and the potential for process interactions and nonlinear relationships. Another critical challenge is the limited amount of available data, mostly small DOE data sets. SR has the necessary capabilities to resolve the issues of process effects modeling and has been applied across multiple industries. 21 SR discovers mathematical expressions that fit a given sample and differs from conventional regression techniques in that a model structure is not defined a priori. 22 Hence, a key advantage of this methodology is that transparent, human-interpretable models can be generated from small and large data sets with no prior assumptions. 23,24 Since the model search process lets the data determine the model, diverse and competitive (e.g., accuracy and complexity) model structures are typically discovered. An ensemble of diverse models can be formed where its constituent models will tend to agree when constrained by observed data yet diverge in new regions. Collecting data in these regions helps to ensure that the target system is accurately modeled, and its optimum is accurately located. 23,24 Exploiting IL2R is secreted by activated T cells and binds to IL2, acting as a sink to dampen its effect on T cells. 25 Since IL2R was much greater than IL2 in solution, this might reduce the overall effect of IL2, which could be further investigated by blocking IL2R with an antibody. In T cells, TNF can increase IL2R, proliferation, and cytokine production. 25 It may also induce apoptosis depending on concentration and alter the CD4 + to CD8 + ratio. 26 Given that TNF has both a soluble and membrane-bound form, this may either increase or decrease CD4 + ratio and/or memory T cells depending on the ratio of the membrane to soluble TNF. 27 Since only soluble TNF was measured, membrane TNF is needed to understand its impact on both CD4 + ratio and memory T cells. Furthermore, IL13 is known to be critical for Th2 response and therefore could be secreted if there are significant Th2 T cells already present in the starting population. 28 This cytokine has limited signaling in T cells and is thought to be more of an effector than a differentiation cytokine. 29 This feature might be emerging as relevant due to an initially large number of Th2 cells or because Th2 cells were preferentially expanded; indeed, IL4 is the conical cytokine that induces Th2 cell differentiation and was observed to be an important variable (Figure 2b,c). The role of these cytokines could be investigated by quantifying the Th1/2/17 subsets both in the starting population and longitudinally. Similar to IL13, IL17 is an effector cytokine produced by Th17 cells 30  In addition to formate, lactate was found as a putative CQA of T N + T CM . Lactate is the end-product of aerobic glycolysis, characteristic of highly proliferating cells and activated T cells. 37,38 Glucose import and glycolytic genes are immediately upregulated in response to T-cell stimulation and thus the generation of lactate. At earlier time points, this abundance suggests a more robust induction of glycolysis and higher overall T-cell proliferation. Interestingly, our models indicate that higher lactate predicts higher CD4 + , both in total and in proportion to CD8 + , seemingly contrary to previous studies showing that CD8 + T cells rely more on glycolysis for proliferation following activation. 39 It may be that glycolytic cells dominate in the culture at the early time points used for prediction, and higher lactate reflects more cells.
Ethanol patterns are difficult to interpret since its production in mammalian cells is still poorly understood. 40  While the spectral resolution is significantly reduced compared to a spectrum at high-field, there are still numerous features that can be attributed to unique metabolites, including those identified as highly predictive (Figure 5c,d). Although this is promising, there will be challenges to acquiring high-quality data in a closed bioreactor system, that is, cells/DMS-particles present in suspension, final media formulation dictated by the amount of spectral complexity/overlap, and accurate quantitation of features with high overlap from other signals.
However, a dedicated benchtop NMR coupled to a bioreactor could provide a simple system for real-time monitoring of CQAs.

| CONCLUSIONS
Henceforth, this two-phase approach enabled in-depth characterization and identification of potential CQAs and CPPs for T cells. More sampling is needed to explore aspects like donor-to-donor variability or orthogonal behaviors from failed expansions when available it can be incorporated into this workflow which will be enriched due to its data-driven iterative design that fine-tunes model parameters as more data fit back into it, providing a powerful framework to optimize a complex experimental space during the cell-manufacturing process, and to facilitate the identification of CPPs and early predictive CQAs from multiomics, which can be used broadly in the cell therapy and regenerative medicine field to accurately predict end-of-manufacturing quality at early stages.
The workflow and methods developed here could eventually allow manufacturers to identify deviations and problems with a manufacturing batch early during the culture and potentially implement corrective in-process controls. This could provide a more thorough understanding of the process parameters and their influence on end-product quality, and allow manufacturers to reduce batch failures, and thus improve cost, reduce risk, and increase access to cell-based therapies.

| Microcarrier fabrication
DMSs were fabricated as previously described. 18 To vary the surface concentration of the antibodies, the anti-CD3/anti-CD28 mAb mixture was further combined with a biotinylated isotype control to reduce the overall fraction of targeted mAbs. All mAbs were low endotoxin azide-free (Biolegend custom, LEAF specification). The surface concentration of the antibodies was quantified as previously described using a bicinchoninic acid assay kit (Thermo Fisher 23227). 18 See Supplementary Methods. old volume) or based on a 300 mg/dl glucose threshold. The ADOE was done using the same feeding schedule as the initial DOE to maintain consistency for validation. Media glucose was measured using a ChemGlass glucometer to confirm cell growth and activation.

| Flow cytometry
At the end of culture, at least 1e5 T cells from each run were washed with PBS once, resuspended in PBS, and stained with Zombie UV (Biolegend, 423107) for 30 min at room temperature in the dark at a 1:1000 dilution. Cells were spun and resuspended in FACS buffer (1X PBS, 2% bovine serum albumin, 5 mM EDTA) and were stained with antibodies according to Table S1 for 60 min in the dark at 4 C.
Cells were then resuspended in fresh FACS buffer, after which they were run on a BD LSR ortessa. All stained was performed in a 96 well v-bottom plate. See Supplementary Methods.

| Cytokine measurements
Cytokines were measured using a custom ProcartaPlex Luminex kit (Thermo Fisher). The assay was performed using media samples taken  This material was used for internal controls within each rack as well as metabolite annotation.

| NMR data collection and processing
NMR spectra were collected on a Bruker Avance III HD spectrometer at 600 MHz using a 5-mm TXI cryogenic probe and TopSpin software (Bruker BioSpin). One-dimensional spectra were collected on all samples using the noesypr1d pulse sequence under automation using ICON NMR software. Two-dimensional (2D) HSQC and TOCSY spectra were collected on internal pooled control samples for metabolite annotation. One-dimensional spectra were manually phased and baseline corrected in TopSpin. 2D spectra were processed in NMRpipe. 45 One dimensional spectra were referenced, water/end regions removed, and normalized with the PQN algorithm 46   were manually binned and integrated to obtain quantitative feature intensities across all samples ( Figure S4). In addition to highly variable features, several other clearly resolved and easily identifiable features were selected (glucose, BCAA region, etc.). Some features were later discovered to belong to the same metabolite but were included in further analysis. Data are available at Dataset S1.

| Metabolite annotation
2D spectra collected on pooled samples were uploaded to COLMARm web server, 47 where HSQC peaks were automatically matched to database peaks. HSQC matches were manually reviewed with additional 2D and proton spectra to confirm the match. Annotations were assigned a confidence score based upon the levels of spectral data supporting the match as previously described. 48 Annotated metabolites were matched to previously selected features used for statistical analysis. Several low abundance features selected for analysis did not have database matches and were not annotated.

| Low-field spectrum simulation
Using the list of annotated metabolites obtained above, an approximation of a representative experimental spectrum was generated using the GISSMO mixture simulation tool. 19

ACKNOWLEDGMENTS
The material is based upon work supported by the National Science

DATA SHARING AND DATA AVAILABILITY
The pre-processed set of the data used in this work is available in Supplementary Information (see Dataset S1