A ten N6‐methyladenosine‐related long non‐coding RNAs signature predicts prognosis of triple‐negative breast cancer

Abstract Background Patients with triple‐negative breast cancer (TNBC) face a major challenge of the poor prognosis, and N6‐methyladenosine‐(m6A) mediated regulation in cancer has been proposed. Therefore, this study aimed to explore the prognostic roles of m6A‐related long non‐coding RNAs (LncRNAs) in TNBC. Methods Clinical information and expression data of TNBC samples were collected from TCGA and GEO databases. Pearson correlation, univariate, and multivariate Cox regression analysis were employed to identify independent prognostic m6A‐related LncRNAs to construct the prognostic score (PS) risk model. Receiver operating characteristic (ROC) curve was used to evaluate the performance of PS risk model. A competing endogenous RNA (ceRNA) network was established for the functional analysis on targeted mRNAs. Results We identified 10 independent prognostic m6A‐related LncRNAs (SAMD12‐AS1, BVES‐AS1, LINC00593, MIR205HG, LINC00571, ANKRD10‐IT1, CIRBP‐AS1, SUCLG2‐AS1, BLACAT1, and HOXB‐AS1) and established a PS risk model accordingly. Relevant results suggested that TNBC patients with lower PS had better overall survival status, and ROC curves proved that the PS model had better prognostic abilities with the AUC of 0.997 and 0.864 in TCGA and GSE76250 datasets, respectively. Recurrence and PS model status were defined as independent prognostic factors of TNBC. These ten LncRNAs were all differentially expressed in high‐risk TNBC compared with controls. The ceRNA network revealed the regulatory axes for nine key LncRNAs, and mRNAs in the network were identified to function in pathways of cell communication, signaling transduction and cancer. Conclusion Our findings proposed a ten‐m6A‐related LncRNAs as potential biomarkers to predict the prognostic risk of TNBC.


| INTRODUC TI ON
Triple-negative breast cancer (TNBC) is an aggressive subtype of breast cancer and is histochemically recognized by the negative expressions of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor-2 (HER-2). 1 TNBC contributes to about 10%-20% of breast cancer cases globally, with a disappointing survival prognosis. [2][3][4] Due to the limitation of advanced progression on effective targeted drugs, chemotherapy is one of the remaining options for a systematic anticancer treatment, 5 but the sensitivity to chemotherapy is an unavoidable difficulty.
Furthermore, compared with other subtypes of breast cancer, patients with TNBC have higher risks of death after distant metastasis and local recurrence. 6,7 Considering the poor prognosis of patients with TNBC and the biocomplexity of TNBC, the identification of multi-target predictors of prognostic response is in urgent.
N6-methyladenosine (m6A) is the methylation that occurs at adenosine N6, which is the most abundant mRNA internal modification in eukaryotic cells. 8 M6A methylation is dynamically reversible in mammalian cells, and its epigenetic modification is considered to regulate the self-renewal, differentiation, invasion, and apoptosis of tumor cells by mediating the expression of cancer-related genes. 9 M6A can be installed by the methyltransferase complex known as writers, be removed by demethylase known as erasers, and be recognized by binding functional proteins know as readers, and the crosstalk among these three regulators is believed to involve in cancer growth and progression. 10 Studies have proved that the overexpression of ALKBH5, an m6A eraser, could decrease the methylation of NANOG mRNA and increase the NANOG protein expression level, thus elevating the proportion of breast cancer stem cells. 11 Niu et al. also identified BNIP3 as an m6A-related anti-oncogene with negative correlation with FTO in expression level in breast cancer patients. 12 In addition to effects of m6A methylation on mRNAs, studies also found its functional regulation in non-coding RNAs. 13,14 Long non-coding RNAs (LncRNAs) and microRNAs (miRNAs) are the main components of non-coding RNAs involved in the regulation of genes at epigenetic, transcriptional, and post-transcriptional levels, and also play crucial roles in cancer development and progression. 15 Among them, LncRNAs act as competitive endogenous RNAs, can inhibit the function of miRNAs in tumor post-transcriptional regulatory network of TNBC. 16 A review also concluded that LncRNAs and m6A may play synergistic roles in cancer therapy. 17 For instance, knockout of METTL3 may reduce the m6A modification level of specific transcripts which results in the inactivation of LncRNA X chromosome, whereas METTL3 was up-regulated by LncRNA-HBXIP which was highly expressed in breast cancer. 18,19 Although the effect of m6A on cancer and the mutual regulation mechanism between m6A and LncRNAs have been extensively studied, no m6A-related LncRNAs have been identified to join in the prognosis of TNBC.
Therefore, this study ascertained to screen m6A-related LncRNAs in the expression level from TNBC samples and then to establish a prognostic risk model to evaluate the predictive abilities of candidate LncRNAs on TNBC prognosis. The procedures of this study were summarized and visualized in Figure 1. Our study will provide potential biomarkers for TNBC prognosis and help to improve treatment strategies for TNBC patients.  22 Ultimately, the cor test in R programming language 3.6.1 was applied to calculate the Pearson correlation coefficient (PCC) on the expression level of m6A-related genes and LncRNAs, thereby filtrating LncRNAs significantly associated with m6A-related genes with standards of |Pearson R| > 0.3 and p < 0.05.

| Screening for independent prognostic m6Arelated LncRNAs
Samples collected from TCGA were randomly classified into training set (n = 61) and validation set (n = 92) in a ratio of 4:6. Integrating with the clinical information, the univariate Cox regression analysis in R package "survival" version 2.41-1 was implemented to filter out prognostic m6Arelated LncRNAs, 23 and then, the multivariate Cox regression analysis was used to identify independent prognostic m6A-related LncRNAs.

| Prognostic score risk model based on independent prognostic m6A-related LncRNAs
A risk model based on the prognostic coefficient and expression level of independent prognostic m6A-related LncRNAs was established as follows: Then, the TNBC samples were grouping into high risk and low risk based on the median value of PS, and the Kaplan-Meier (KM) curve in R package was applied to evaluate the difference in survival status of patients between the two groups. 23 The approach of support vector machine (SVM) in 6.1 e1071 Version 1.6-8 was applied for conducting SVM classifier, thereby evaluating the prognostic performance of the PS model in TCGA validation set and GSE76250. 24 We then employed  Table S1. Ten independent prognostic m6A-related LncRNAs in- were further identified by the multivariate Cox regression analysis.
Their correlations with m6A-related mRNAs in the expression level were shown in Figure 2A. The information of these ten m6A-related prognostic LncRNAs was organized in Table 1 and was visualized in were associated with better overall survival (OS) of TNBC ( Figure 2C).

| Prognostic performance of risk prediction models
To verify the prognostic effect of candidate LncRNAs, we established risk models. The TCGA dataset was classified into training set

| Screening for independent prognostic factors
We continued to gather statistics on clinical characteristics of TNBC samples sourced from TCGA and performed univariate and multivariate Cox regression analysis accordingly ( Figure 4; Table 2). The univariate Cox regression analysis showed that

| Screening for DEGs between high-risk and low-risk groups
We used the Limma package to analysis DEGs between the highrisk and low-risk groups, and then obtained 814 DEGs at the cutoff TA B L E 1 Ten independent prognostic m6A-related LncRNAs     Figure 5A). These genes were proved to effectively distinguish samples between high-risk and low-risk groups ( Figure 5B).

| Construction of ceRNA network and functional analysis
A total of 67 miRNAs were determined to be related to TNBC in the HMDD database, then 69 interactions comprising 9 m6A-  (Table S2). Afterward, mRNAs involved in the ceRNA network were annotated and enriched by KEGG, and eight core genes and nine pathways associated with cell communication, signaling transduction, and human disease (cancer) were finally picked up ( Figure 6C).

| Expression validation
We further extracted the expression data of 10 m6A-related prognostic LncRNAs from TCGA and GSE76250, and then compared the expression levels between high-risk and low-risk groups. As shown in Figure 7A, the expressions of risk factors including LINC00571, CIRBP-AS1, and HOXB-AS1 with HR > 1 were significantly higher in the high-risk group than that in the low-risk group, and the other protectors with HR < 1 had significantly higher expression level in the low-risk group comparing with the high-risk group, which were logically consistent with the prediction. Furthermore, in the GSE76250 dataset, the expression trends of candidate LncRNAs  However, it is considerable that the limitation of sample size, the lack of clinical information of samples, and the absence of detection on methylation level of candidate LncRNAs are shortcomings in this study. Therefore, a larger sample size and the collection of solid tumor samples are required in the following studies to investigate the regulatory mechanism of candidate LncRNAs in TNBC.
Moreover, we will further explore the differences in the prognostic effect of m6A-related LncRNAs on TNBC in various populations for a more accurate prognosis treatment strategy.
Conclusively, by mining the data from public databases, we identified ten m6A-related prognostic LncRNA signatures, confirmed their predictive roles in prognostic risk for TNBC patients, and pointed out the potential mechanisms of candidate LncRNA-related ceRNA regulation. Our findings may help to improve the prognosis for patients with TNBC.

ACK N OWLED G EM ENTS
This research was funded by the National Natural Science Foundation of China (No. 11572200, No. 11502146, No. 81773043).

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
In this study, the data from TCGA database are available from https://gdc-portal.nci.nih.gov/, while the GSE76250 dataset can be found in NCBI-GEO (http://www.ncbi.nlm.nih.gov/geo/).