Metabolite reanalysis revealed potential biomarkers for COVID-19: a potential link with immune response

Aim: To understand the pathological progress of COVID-19 and to explore the potential biomarkers. Background: The COVID-19 pandemic is ongoing. There is metabolomics research about COVID-19 indicating the rich information of metabolomics is worthy of further data mining. Methods: We applied bioinformatics technology to reanalyze the published metabolomics data of COVID-19. Results: Benzoate, β-alanine and 4-chlorobenzoic acid were first reported to be used as potential biomarkers to distinguish COVID-19 patients from healthy individuals; taurochenodeoxycholic acid 3-sulfate, glucuronate and N,N,N-trimethyl-alanylproline betaine TMAP are the top classifiers in the receiver operating characteristic curve of COVID-severe and COVID-nonsevere patients. Conclusion: These unique metabolites suggest an underlying immunoregulatory treatment strategy for COVID-19.

of the disease recovery process in patients with long-term symptoms. They analyzed 1H NMR spectroscopy data in human plasma and modeled them together with a variety of plasma cytokines and chemokines. The results [6] showed a unique pattern for SARS-CoV-2-infected cells with multiple levels. The immune response interacts with the plasma lipoprotein group to give a strong and unique immune metabolic phenotype of the disease. Meoni et al. [7] analyzed the metabolomics and lipidomics of COVID-19 patients and showed that COVID-19 patients had characteristic NMR-based metabolomic and lipidomic characteristics. There was an exploratory study [8] on 162 metabolites in the plasma of ICU patients (both COVID-19+ and COVID-19-) and the data was analyzed using advanced machine learning. A unique COVID-19 plasma metabolome was discovered, which is mainly determined by the changes in kynurenine, arginine, sarcosine and LysoPCs. Moreover, creatinine alone or the creatinine/arginine ratio can predict ICU mortality with 100% accuracy, suggesting that metabolites (kynurenine, arginine and creatinine) can be regarded as potential biomarkers and prognostic markers for diagnosis of COVID- 19. Guo et al. [9] compared metabolomic and proteomic profiles of serum samples obtained from COVID-19 patients with that of healthy volunteers and symptomatic patients diagnosed with non-COVID-19 disease control. From their results, they discovered protein and metabolite dysregulation in severe COVID-19 patient sera, which may contribute to macrophage modulation. The application of bioinformatics to clinical medicine can improve the accuracy of diagnosis and treatments. We applied bioinformatics technology to reanalyze the previously published metabolomics data of COVID-19 patients. Previous studies [9] on the characteristics of plasma metabolism in patients with COVID-19 have found that the proteins and metabolites in the serum of patients with severe COVID-19 are imbalanced, which may contribute to the regulation of macrophages. This integration of proteomics and metabolomics methods provides a view of the underlying pathology of disease progression, but the rich information in metabolomics may be overlooked. Therefore, it is worthy of more detailed research.
Herein, we conducted further statistical analysis on the data from the previously reported metabolomics studies. Our analysis suggests that COVID-19 infection affected the cell signal, nucleic acid metabolism and amino acid metabolism networks in the COVID-19 patients. Metabolites, including benzoic acid, phosphate and inosine, were first reported to significantly increase in sera from COVID-19 patients, promoting immune response and inflammation development, contributing to damage of multiple tissues and organs such as lung, liver and kidney. The metabolomics profiles of COVID-19 patients were also distinct from the disease control group, suggesting a clue to targeted treatment strategy for nonsevere and severe patients.

Data source
The data used in the study was retrieved from the CoronaMassKB database (dataset ID MSV000085507) which consisted of 25 samples from nonsevere patients diagnosed with COVID-19, 21 samples from severe patients diagnosed with COVID-19, 25 samples from healthy individuals and 25 samples from non-COVID-disease control patients. The detailed patient descriptions including the sampling date and the metabolite data for each patient can be found in the original paper [9] and are shown in Supplementary Tables 1-3. According to the source information, 65 COVID-19 patients were classified into four subgroups based on the Chinese Government Diagnosis and Treatment Guideline (Trial 5th version) [10]. Mild: mild symptoms without pneumonia; typical: fever or respiratory tract symptoms with pneumonia; severe: fulfill any of the three criteria: respiratory distress, respiratory rate R 30 times/min; means oxygen saturation 93% in resting state; arterial blood oxygen partial pressure/oxygen concentration 300 mmHg (1 mmHg = 0.133 kPa); critical: fulfilling any of the following three criteria: respiratory failure and requirement for mechanical ventilation, shock incidence or admission to ICU with other organ failure [9]. The mild and typical subgroups made up the nonsevere subgroup. The disease control group in the study consisted of 25 non-COVID-19 patients with similar clinical characteristics including fever and/or cough as COVID-19 patients, but tested negative for COVID-19 [9]. Causal analysis of the infection showed that four patients were infected by herpes simplex virus, one patient infected by varicella-zoster virus, one by respiratory syncytial virus, one by Klebsiella pneumoniae and Acinetobacter baumannii and one by Enterococcus faecium [9]. Some patients had other diseases, including cancer, cerebral hemorrhage or lymphoma [9]. No infection was detected in the other patients according to respiratory tract virus antigen test [9]. The healthy group included serum samples from 28 healthy individuals [

Statistical analysis
We performed a two-sample t-test on COVID-19 patients' group and the healthy group to understand the pathology of the disease, followed by another two-sample t-test to distinguish the COVID-19 patients' group from the disease control group; finally, we examined the difference between the severe and nonsevere subgroups to predict disease progression. Statistical analysis was performed on MetaboAnalyst (https://www.metaboanalyst.ca/) a metabolomics analysis platform that has integrated R scripts [11][12][13][14][15][16][17][18][19][20][21][22][23]. Missing values were estimated using k-nearest neighbors. Metabolites with missing values in 50% or more samples were excluded. Subsequently, the data were normalized by sum of the total ion intensity, log transformed and scaled using Pareto scaling to facilitate the downstream hypothesis testing. In the subsequent two-sample t-test of each group, metabolites with a concentration change greater than twofold and a false discovery rate-adjusted p-value <0.05 were considered significantly altered. For hierarchical clustering analysis, the distance measure was Euclidean, and the clustering algorithm was ward. Classification using random forest and subsequent characteristic (ROC) analysis was carried out using seven predictors and up to 500 trees. To analyze the disease-related metabolic pathways and cell signaling networks, pathway analysis was performed using Ingenuity Pathway Analysis (IPA), which is a cloud computing-based bioinformatics software utilising integrated metabolomics analysis for the over-represented metabolic pathways.

Results
Metabolomics profiles of COVID-19 patients were significantly different from those of the healthy group We first conducted the two-sample t-tests of metabolites between COVID-19 patients and healthy individuals. The analysis revealed that COVID-19 patients' plasma metabolomic profiles were distinctly different from those of healthy individuals ( Figure 1A). Using the cutoff of adjusted p < 0.05 and fold change >2 (the same thereafter), 88 metabolites were identified as being significantly changed in COVID-19 patients. Twenty of the 88 significantly altered metabolites were mapped into metabolic pathways to identify overrepresented pathways, which were discussed in detail later. The concentration changes of eight altered metabolites with the most significantly adjusted p and fold change is shown in Figure 1B. A heat map using the top 25 differentially produced metabolites demonstrated a clear difference between COVID-19 patients and healthy individuals ( Figure 1C). The partial least squares discriminant analysis (PLS-DA) successfully separated the COVID patients from healthy individuals ( Figure 1D). ROC analysis based on random forest yielded an area under the curve (AUC) of 0.997 when the five most significant metabolites are used as classifiers (95% CI: 0.968-1) ( Figure 1E), indicating the practicality of using metabolite biomarkers to differentiate COVID from healthy individuals. On the basis of the mean decrease in the accuracy of the random forest classification, the top three biomarkers that can be used to differentiate COVID-19 patients from healthy individuals were β-alanine (Q = 1.26 × 10 -21 ), o-cresol sulfate (Q = 2.38 × 10 -9 ), 4-methoxyphenol sulfate (q = 3.77 × 10 -9 ) ( Figure 1F). Ingenuity pathway analysis indicated alteration of purine ribonucleotides degradation (p = 4.19 × 10 -4 ), purine nucleotides degradation II (aerobic) (p = 1.24 × 10 -3 ), salvage pathways of pyrimidine (p = 1.24 × 10 -3 ). Furthermore, network analysis of prioritized metabolites in the COVID-19 patients ( Figure 2) suggested that the networks of cell-to-cell signaling and interaction (score = 30) and nucleic acid metabolism and amino acid metabolism (score = 24) played a very active regulatory role in COVID-19.
Metabolomics profiles of COVID-19 patients were distinct from those of the disease control group Two-sample t-test between COVID-19 patients and non-COVID disease control samples was carried out to identify biomarkers that could be used to differentiate the two pathological conditions. The result suggested that COVID-19 patients' plasma metabolomics profiles were different from those of disease controls but to a lesser extent than the comparison between COVID-19 and healthy individuals ( Figure 3A). On the basis of the criteria described earlier, 35 metabolites were identified as significantly changed, and the differential production of eight metabolites with most significant fold change and adjusted p-value is shown in Figure 3B. Hierarchical clustering using top 25 significant metabolites revealed that there was moderate distinction between the metabolomics profiles of the two groups ( Figure 3C). Meanwhile, based on PLS-DA analysis, the COVID-patients and disease control samples were largely distinguishable ( Figure 3D), although less so compared to COVID versus healthy analysis ( Figure 4). Finally, ROC curve based on a random forest using five metabolites as classifiers resulted in an AUC of 0.935 (95% CI: 0.836-1) ( Figure 3E). In random forest analysis, we listed top 15 metabolites with marked mean decreases in accuracy. On the basis of the mean decrease in accuracy of the random forest classification, the top   three biomarkers with significant adjusted p-value that could be used to differentiate COVID-19 patients from disease control patients were cysteine sulfinic acid (Q = 7.118 × 10 -5 ), phosphocholine (Q = 2.253 × 10 -10 ), 3-sulfo-L-alanine (Q = 0.00164) ( Figure 3F).
Selected metabolites profiles could differentiate severe group and non-severe group COVID-19 patients To determine the practicality of using biomarker to assess disease progression, we examined the metabolomes from severe and nonsevere patients in the volcano plot ( Figure 5A). On the basis of the same criteria described earlier,  11 metabolites were noted as significantly altered, of which nine are shown in Figure 5B. PLS-DA analysis showed that the severe and nonsevere cases could be separated by differentially regulated metabolomes ( Figure 5D). The differences between the metabolomics profiles of severe and nonsevere COVID-19 patients were revealed in the heat map ( Figure 5C). The differential metabolomics profile was also analyzed by ROC analysis (0.805; 95% CI: 0.625-0.974) using ten classifiers ( Figure 5E), and a number of differential metabolites with significantly adjusted p-values were identified, such as taurochenodeoxycholic acid 3-sulfate (Q = 0.00479), 5α-pregnan-diol disulfate (Q = 0.00762) and N,N,N-trimethyl-alanylproline betaine TMAP (Q = 0.00302) ( Figure 5F).

Discussion
We first reported that the metabolites benzoate, β-alanine and 4-chlorobenzoic acid to be used as potential biomarkers to distinguish COVID-19 patients from healthy individuals with an AUC of 0.997 (95% CI: 0.988-1). Further analysis of severe and nonsevere COVID resulted in an ROC curve with an AUC of 0.805 (95% CI: 0.625-0.974), and the top classifiers were taurochenodeoxycholic acid 3-sulfate, glucuronate and N,N,Ntrimethyl-alanylproline betaine TMAP. In the comparison of the ROC analysis of healthy and COVID-19 groups and non-COVID-19 and COVID-19 patients mentioned earlier, differentiating severe from nonsevere COVID-19 patients was more challenging but still generated a fair result [24]. In general, the ROC analysis established in our research can distinguish severe from nonsevere patients, which can provide an important basis for personalized and precise treatment. The immediate analysis of these metabolites can be developed rapidly, and patient stratification is essential for future COVID-19 treatment drug trials. The results of these studies on the metabolome need to be verified in a larger COVID-19 population before they are useful. Among the differentially produced metabolites found in patients carrying COVID-19, benzoic acid, phosphate, inosine and sucrose have been suggested to promote immune response and inflammation development [25][26][27]. (R)-3-hydroxybutyric acid was correlated to phagocytosis cell damage [28]. It was also reported that N-formyl future science group 10.2217/fmb-2021-0047  control   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  33  34  35  36  37  38  39 40  41  42  43  44  45  46  47  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  17  18  19  20  21  22  23  24  48  49  50  51  52  53  54  55  56  80 81  82  83  84  85  86  87 Sig. down [25] Sig. up [10] Unsig. [770] High Low Component one (7.3%) phenyl alanine chemotaxis can regulate the aggregation of human neutrophils [29]. The aforementioned metabolites were all inflammation-promoting factors and were all significantly increased in samples from COVID-19 patients, indicating an important role of inflammation in COVID-19 disease progression. There was a significant increase in cytosine and ribose, indicated by the overrepresentation of purine ribonucleotides degradation. The upregulations of purine nucleotides degradation II (aerobic) and other nucleic acid metabolome pathways were indicators of inflammation [30]. On the other hand, β-alanine, tauroursodeoxycholic and taurocholic acid were also elevated in COVID-19 patients, which have been suggested to be involved in the negative feedback of the inflammation that plays a certain role in immune regulation [31][32][33]. More negative feedback modulators were decreased, such as lenticin, which contained a protective mechanism during the damage to maintain the stability of the internal environment [34], 15(S)-HETE, which was the negative feedback regulator of immune response [35], 2,7,8-trimethyl-2-(β-carboxyethyl)-6-hydroxychroman, which was the peroxy free radical scavenger that maintains stability [36]. Their disorder served as a potential indicator that some negative feedback modulators were overconsumed in COVID-19 patients, which made the inflammatory response control ineffective.
The recent study [8] on critically ill COVID-19 patients showed that there are unique metabolomes in the plasma of COVID-19 ICU patients, including kynurenine, arginine, sarcosine and LysoPCs. They proposed a diet supplementation of tryptophan, arginine, sarcosine and LysoPCsas adjuvant therapy may contribute to COVID-19 outcome. Our study has similar results, such as increased kynurenine, suggesting that the immune response  79  78  77  76  75  74  73  47  46  45  44  43  42  41  16  15  14  13  12  11  10  72  71  70  69  63  67  66  65  40  39  38  37  36  35  34  33  9  8  7  6  5  4  3  is overactivated. Activating COVID-19 causes strong T-cell activation, and IFN-γ rises, which in turn causes the degradation of tryptophan to increase, and kynurenine also increases. Targeting metabolism markedly modulates the proinflammatory cytokines release by peripheral blood mononuclear cells isolated from SARS-CoV-2-infected rhesus macaques ex vivo according to the most recent study on fatal cytokine-release syndrome in COVID-19 [37]. The results suggest that patients' immune regulatory mechanism may be a potential therapeutic target for COVID- 19. Clinical treatment also showed that suppressing inflammation can help alleviate disease symptoms [38]. These unique metabolites are accurate diagnostic/prognostic biomarkers for future research, representing a variety of metabolites that affect immune function and can be used for stratified evaluation of patients in clinical treatment.

Conclusion
Together, the analysis results revealed that COVID-19 infection affected patients' cell signal, nucleic acid metabolism and amino acid metabolism networks. Our research on the COVID-19 metabolome pathways contributed to the understanding of the role of immune regulatory pathways during viral infection, which will also serve as an important therapeutic target for more effective treatment of COVID-19.

Future perspective
Although vaccines are bringing hope to the fight against the COVID-19 virus, we have experienced a third wave of the pandemic involving variants of COVID-19. Therefore, exploring biomarkers related to COVID-19 disease has important long-term significance for appropriate personalized treatment and prevention of COVID-19. Studies have shown that with regard to the pathogenesis of COVID-19, in addition to direct virus invasion and damage to target tissues, the increase in inflammatory biomarkers such as C-reactive protein reflects the development of the disease. Moreover, the metabolomics research on COVID-19 suggests that some characteristic metabolites can be used as biomarkers related to COVID-19, such as SPC total and SPC total/GlycA, kynurenine, arginine and creatinine, forming a unique immune metabolic phenotype. However, due to the large number of molecules involved, in further research, mathematical models could be established by deep learning, taking into account various factors related to the disease to guide more accurate application of antiinflammatory and immunostimulatory activities, reducing the severity of the disease and preventing multiple organ failure and death. In addition, it also has guiding significance for diagnosis and treatment of other viral infections in the future.

Supplementary data
To view the supplementary data that accompany this paper please visit the journal website at: www.futuremedicine.com/doi/sup pl/10.2217/fmb-2021-0047

Author contributions
All authors participated sufficiently in the work to take responsibility for the content and all those who qualify are listed. X Chen and Y Sun designed and supervised the project. X Chen, ML Gu and TD Li conducted metabolomic reanalysis. Data were interpreted and presented by all coauthors. X Chen wrote the manuscript with input from coauthors.

Acknowledgments
The authors thank Tiannan Guo team for Metabolomic Characterization of COVID-19 Patient Sera to this study.

Financial & competing interests disclosure
This work is supported by grants from Youth Clinical Medical Talents Training Funding of Shanghai (HYWJ201802). The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.

Summary points
• Two-sample t-tests on COVID-19 patients and healthy participants to understand the pathology of the disease were conducted, followed by two-sample t-tests to distinguish COVID-19 patients from a disease control group and differentiate severe and nonsevere subgroup of COVID-19 patients. • Pathway analysis was performed using Ingenuity Pathway Analysis (IPA), a cloud computing-based bioinformatics software with integrated metabolomics analysis for the overrepresented metabolic pathways. • The metabolomics profiles of COVID-19 patients were significantly different from those of the healthy group.
Receiver operating characteristic (ROC) analysis based on a random forest plot yielded an area under the curve (AUC) of 0.997 when the five most significant metabolites were used as classifiers (95% CI: 0.968-1), indicating the practicality of using metabolite biomarkers to differentiate COVID patients from healthy individuals. The top three biomarkers that can be used to differentiate COVID-19 patients from healthy individuals were β-alanine, o-cresol sulfate and 4-methoxyphenol sulfate. • IPA indicated alteration of purine ribonucleotides degradation, purine nucleotides degradation II (aerobic) and salvage pathways of pyrimidine. Network analysis suggested that the networks of cell-to-cell signaling and interaction, nucleic acid metabolism and amino acid metabolism played active regulatory roles in COVID-19. • The metabolomic profiles of COVID-19 patients were distinctive from disease control group. An ROC curve based on a random forest plot using five metabolites as classifiers resulted in an AUC of 0.935 (95% CI: 0.836-1). The top three biomarkers with significant adjusted p-values that can be used to differentiate COVID-19 patients from disease control patients were cysteine sulfinic acid, phosphocholine and 3-sulfo-L-alanine. • The selected metabolites profiles could differentiate severe and nonsevere subgroups of COVID-19 patients.
Partial least squares discriminant analysis showed that the severe and nonsevere cases could be separated by differentially regulated metabolomes. The differential metabolomics profile was also analyzed by ROC analysis (0.805, 95% CI 0.625-0.974) using ten classifiers and a number of differential metabolites with significantly adjusted p-values were identified, such as taurochenodeoxycholic acid 3-sulfate, 5α-pregnan-diol disulfate and N,N,N-trimethyl-alanylproline betaine TMAP. • β-alanine, o-cresol sulfate, 4-methoxyphenol sulfate, taurochenodeoxycholic acid 3-sulfate, 5α-pregnan-diol disulfate and N,N,N-trimethyl-alanylproline betaine TMAP could be biomarkers in COVID-19, which contributes to understanding the role of immune regulatory pathways during infection and could serve as an important therapeutic target for treatment of COVID-19.