Comprehensive Approach to Distinguish Patients with Solid Tumors from Healthy Controls by Combining Androgen Receptor Mutation p.H875Y with Cell-Free DNA Methylation and Circulating miRNAs

Simple Summary Blood-based tests for cancer detection are minimally invasive and could be useful for screening asymptomatic patients and high-risk populations. Since a single molecular biomarker is usually insufficient for an accurate diagnosis, we developed a multi-analyte liquid biopsy-based classification model to distinguish cancer patients from healthy subjects. The combination of cell-free DNA mutations, miRNAs, and cell-free DNA methylation markers improved the model’s performance. Moreover, we demonstrated that the androgen receptor mutation p.H875Y is not only relevant in prostate cancer but had a strong predictive value for colorectal, bladder, and breast cancer. Our results, although preliminary, showed that a single liquid biopsy test could detect multiple cancer types simultaneously. Abstract Liquid biopsy-based tests emerge progressively as an important tool for cancer diagnostics and management. Currently, researchers focus on a single biomarker type and one tumor entity. This study aimed to create a multi-analyte liquid biopsy test for the simultaneous detection of several solid cancers. For this purpose, we analyzed cell-free DNA (cfDNA) mutations and methylation, as well as circulating miRNAs (miRNAs) in plasma samples from 97 patients with cancer (20 bladder, 9 brain, 30 breast, 28 colorectal, 29 lung, 19 ovarian, 12 pancreas, 27 prostate, 23 stomach) and 15 healthy controls via real-time qPCR. Androgen receptor p.H875Y mutation (AR) was detected for the first time in bladder, lung, stomach, ovarian, brain, and pancreas cancer, all together in 51.3% of all cancer samples and in none of the healthy controls. A discriminant function model, comprising cfDNA mutations (COSM10758, COSM18561), cfDNA methylation markers (MLH1, MDR1, GATA5, SFN) and miRNAs (miR-17-5p, miR-20a-5p, miR-21-5p, miR-26a-5p, miR-27a-3p, miR-29c-3p, miR-92a-3p, miR-101-3p, miR-133a-3p, miR-148b-3p, miR-155-5p, miR-195-5p) could further classify healthy and tumor samples with 95.4% accuracy, 97.9% sensitivity, 80% specificity. This multi-analyte liquid biopsy-based test may help improve the simultaneous detection of several cancer types and underlines the importance of combining genetic and epigenetic biomarkers.


Introduction
Cancer is mostly a manageable disease as long as it is diagnosed and treated before metastasis has begun. In most cases, higher-grade cancer evolves from lower-grade cancer. Thus, early tumor detection could increase the chances of successful treatment. In this way, carcinomas could be identified at an early stage when they can still be surgically removed or female above the age of 25 without a previously treated cancer. Plasma was obtained from a total of 205 patients with stage I, II, or III cancer prior to cancer therapy and a control group (n = 15) of subjects with no evidence of malignancies. Nonetheless, 7 subjects of the control group had one of the following conditions at the sample collection point: pulmonary fibrosis, renal cyst, hemorrhoidal disease, dyspepsia, peritonsillar phlegmon, or endometrial polyp. Cancer types included in the study were liver, lung, pancreas, colorectal cancer, prostate, ovarian, breast, stomach, bladder, and brain cancer. The clinical data of these study groups are summarized in Table 1.

Sample Collection and Liquid Biopsy
Peripheral venous blood samples were collected prior to surgery and therapy in K-2 EDTA vacutainers. Subsequently, plasma was separated via double centrifugation as described previously [22]. Whole blood samples were processed within one hour after the blood draw. Briefly, blood samples were centrifuged at 2000× g for 10 min at 4 • C, followed by centrifugation of the supernatant at 16,000× g for 10 min at 4 • C. The prepared plasma samples were stored at −80 • C until shipment. All samples were shipped frozen (−20 • C) on dry ice and stored temporarily at −20 • C upon arrival.

Cell-Free DNA Extraction, Processing, and Analysis
Cell-free DNA (cfDNA) was isolated with MagMAX™ Cell-Free DNA Isolation Kit (ThermoFisher Scientific, Waltham, MA, USA) using KingFisher™ Duo Prime Magnetic Particle Processor (ThermoFisher Scientific, Waltham, MA, USA) according to the user guidelines. CfDNA was isolated from 4 mL plasma and eluted with elution solution in a final volume of 80 µL. The purified cfDNA samples were stored at −20 • C until further analysis. DNA was quantified using the dsDNA HS Assay Kit on Qubit 4 Fluorometer (Invitrogen, ThermoFisher Scientific, Waltham, MA, USA) according to the standard kit protocol.
The BL-WGA cfDNA (10-20 ng pro reaction) was then used for the mutational analysis with TaqMan™ Mutation Detection Assays (ThermoFisher). The array is designed to analyze 75 cancer-specific mutations in 21 genes and consists of a Reference Assay for the amplification of a mutation-free and polymorphism-free region of the target gene in addition to the Mutation Assay. Namely, the genes are AKT1, APC, AR, BRAF, CTNNB1, EGFR, ERBB2, ESR1, FBXW7, FGFR3, GNAS, HRAS, IDH1, KRAS, MED12, NRAS, PIK3CA, SMAD4, TERT, TP53, and VHL (Supplementary Table S1). ∆Ct values for the detection of mutations were established for each gene and defined as: The presence of a mutation in a sample was determined upon an assay-specific cutoff point (Supplementary Table S1). The DNA mutation screening was performed on a QuantStudio 3 Real-Time PCR System (Applied Biosystems, ThermoFisher).

Methylation Analysis
For methylation analysis, 70 µL of the purified cfDNA was divided into two fractions-one containing methylated cfDNA and one containing unmethylated cfDNA, using MethylMiner™ Methylated DNA Enrichment Kit (Invitrogen, ThermoFisher). This method is based on the binding of methylated DNA to MBD2 protein which is coupled to magnetic beads. The methylated fragments can then be eluted as a single enriched fraction with a high salt concentration solution (NaCl), thereby separating methylated (Me cfDNA) from unmethylated cfDNA (UnME cfDNA). Both fractions were subsequently quantified via real-time qPCR for 12 different cancer-relevant genetic regions (SEPT9, MLH1, MGMT, GATA5, GSTP1, SFN, MDR1, VIM, SHOX2, ALKBH3, APC, RASSF1A). Then, 2 µL of each fraction of cfDNA was amplified using a custom-designed primer (150 nM, Supplementary Table S2) and GoTaq ® qPCR Master Mix (Promega, US) in a final volume of 10µL on QuantStudio 3 (ThermoFisher). The methylation level for each region was calculated using the following formula: C f DN A methylation % = 100 − (100/ 1 + 2 −CtMe cfDNA −CtUnMe cfDNA .

RNA Extraction, Processing, and microRNA Analysis
Total RNA was isolated from 100 µL plasma with MagMAX™ mirVana™ Total RNA Isolation Kit (ThermoFisher Scientific, Waltham, MA, USA) using KingFisher™ Duo Prime Magnetic Particle Processor (ThermoFisher Scientific, Waltham, MA, USA) according to the user guidelines. Spike-in miRNA C. elegans 39 was added during the RNA purification at a concentration of 15 fmol per sample. Total RNA was eluted with elution buffer in final volumes of 50 µL, and samples were stored at −20 • C until further analysis.
After RNA purification, miRNA was transcribed into cDNA using the TaqMan™ Advanced miRNA cDNA Synthesis Kit (ThermoFisher). A 1:10 dilution of the cDNA was taken for the analysis of 48 miRNAs (C. elegans spike-in control, Supplementary Table S3) using prespotted Taqman adv. miRNA 96 well plates (ThermoFisher) on a QuantStudio 3 Real-Time PCR System (ThermoFisher) in a final reaction volume of 10µL. For data normalization global mean of all analyzed miRNAs were used as previously described [24].

Statistical Analysis
One-way ANOVA for continuous variables and χ2 test and cross-tabulation for categorical variables were used to analyze the characteristics of the subjects. A t-test of independent samples was performed to compare the mutational burden between cancerfree subjects and cancer patients. The correlation between cell-free DNA concentration and cancer stage was analyzed with a Spearman's ρ rank coefficient test. miRNA expression values were standardized by converting them to Z-scores. A one-way ANOVA was carried out to determine whether miRNAs are differently expressed or cfDNA methylation varies across the test groups. A χ 2 automatic interaction detection decision tree model (CHAID) was used to split the samples into subsets. The diagnostic potential of cfDNA mutations, cfDNA methylation markers, and miRNAs was analyzed in discriminant function analyses (DA) with a leave-one-out cross-validation. The performance of these DAs was further estimated by a receiver operating characteristic (ROC) analysis and area under the curve (AUC). Statistical analyses were carried out in IBM ® SPSS ® Statistics 20 Software.

Identification of Candidate Biomarkers
For the identification of the potential diagnostic markers, the correlation matrix of all variables was calculated (Supplementary Figure S1). Firstly, all variables with missing values were excluded from the analysis. Further, continuous variables (miRNA level and cfDNA methylation percentage) were dichotomized upon an automatically defined threshold value.
Subsequently, for each cancer type, the correlations of each cancer type with each biomarker were calculated and sorted by their absolute values (Supplementary Figure S2). Since many measured variables compared to a relatively small sample size tend to produce spurious correlations, a subset of the best biomarkers with the highest correlation (by absolute value) for each cancer type was chosen for further tests.
Then, all combinations of these best biomarkers were tested regarding their importance to predict a particular tumor type versus the healthy control group. Variables with redundant information were eliminated based on a covariance matrix to further alleviate the effects of overfitting. Thus, superfluous biomarkers that yield no improvement concerning the classification performance of each cancer type were excluded. In order to do so, a score was defined, where false negatives are discouraged by a factor of two compared to false positives. All computations were carried out in R version 4.1.2.

Patient Characteristics
A total of 205 cancer and 15 cancer-free plasma samples were collected. One of the cancer samples was excluded since the patient was diagnosed with stage 4 ovarian cancer after plasma collection. We received only seven plasma samples from patients with liver cancer, making the size of this sample group too small to yield any meaningful results and thereby was excluded from the statistical data analysis. Hence, a total of 212 samples were analyzed for mutations, miRNAs, and DNA methylation. Patient characteristics are summarized in Table 1.
One-way ANOVA test showed significant differences between the age of the healthy control group and bladder cancer (p < 0.001), CRC (p < 0.05), prostate cancer (p < 0.05), and stomach cancer (p < 0.005). The BMI of the subjects with ovarian cancer differed significantly from the BMI of the patients with CRC (p < 0.05) and stomach cancer (p < 0.001).

Plasma cfDNA Mutation Detection
Targeted mutation analysis was implemented to investigate 75 alterations such as nucleotides insertions and substitutions (Supplementary Table S1), referred to as mutations. Among the 197 patients with tumors, at least one mutation was detected in 187 patients (94.9%). In 8 out of the 15 healthy control samples, at least one mutation was detected in  Table S4). The mutations COSM5677 (CTNNB1), COSM6223 (EGFR), COSM22932 (FBXW7), COSM483 (HRAS), COSM499 (HRAS), COSM518 (KRAS), COSM10779 (TP53) were not detected in any sample and were therefore excluded from the analysis. All of the analyzed cfDNA alterations and their frequencies in this study population are listed in Supplementary Table S4. A significant difference in the mutation burden of cfDNA between healthy subjects and cancer patients was prominent (p < 0.001, mean difference = 4.819, std error = 0.463, 95% CI 3.896-5.741). Cancer patients had 6.15, while the control group had 1.33 mutations on average.  Table S4). This mutation was detected in the plasma of subjects with CRC (85.71%), bladder (80%), prostate (66.67%), and breast (60%) cancer samples, and none of the healthy controls. The percentages of AR mutation-positive patients (AR+) for lung, stomach, ovarian, brain, and pancreas cancer were 48.28, 26.09, 20, 11.11, and 8.33, respectively. Interestingly, cancer samples with an AR mutation (n = 101) had an overall higher total mutational burden than samples without an AR mutation (AR-, n = 96), 7.5 and 4.8 mutations on average, respectively (p < 0.001).

Identification of Differently Expressed Circulating miRNAs
Among the 47 analyzed miRNAs, four were under the detection limit for the reference sample (miRNAs 30a-5p, 218-5p, 1225-3p, 203a-3p); therefore, they were excluded from the analysis. A heatmap was generated for the remaining 43 miRNAs (Supplementary Figure S4). After computing the Z-scores for the miRNA expression data, a differential analysis was conducted, and significantly deregulated miRNAs are depicted in Figure  2. MiRNAs 133a-3p and 23a-3p were significantly up-regulated on subjects with brain ( Figure 2B and 2G respectively). MiR-148a-3p was significantly elevated in subjects with pancreas cancer compared to all groups except brain and ovarian cancer ( Figure 2C). Additionally, higher levels of miR-34a-5p for subjects with pancreas cancer were observed ( Figure 2J). Furthermore, miR-31-5p in pancreas cancer was down-regulated compared to breast and ovarian cancer and up-regulated compared to the bladder, CRC, lung, prostate, stomach cancer, and the control group ( Figure 2I). Interestingly, cancer samples with an AR mutation (AR+, n = 101) showed significantly lower levels of miRNAs 148a-3p, 148b-3p, 195-5p, 210-3p, 23a-3p, 25-3p when compared to samples without an AR mutation (AR-, n = 96) ( Figure 2L).

Identification of Differently Expressed Circulating miRNAs
Among the 47 analyzed miRNAs, four were under the detection limit for the reference sample (miRNAs 30a-5p, 218-5p, 1225-3p, 203a-3p); therefore, they were excluded from the analysis. A heatmap was generated for the remaining 43 miRNAs (Supplementary Figure S4). After computing the Z-scores for the miRNA expression data, a differential analysis was conducted, and significantly deregulated miRNAs are depicted in Figure 2. MiRNAs 133a-3p and 23a-3p were significantly up-regulated on subjects with brain ( Figures 2B and 2G respectively). MiR-148a-3p was significantly elevated in subjects with pancreas cancer compared to all groups except brain and ovarian cancer ( Figure 2C). Additionally, higher levels of miR-34a-5p for subjects with pancreas cancer were observed ( Figure 2J). Furthermore, miR-31-5p in pancreas cancer was down-regulated compared to breast and ovarian cancer and up-regulated compared to the bladder, CRC, lung, prostate, stomach cancer, and the control group ( Figure 2I). Interestingly, cancer samples with an AR mutation (AR+, n = 101) showed significantly lower levels of miRNAs 148a-3p, 148b-3p, 195-5p, 210-3p, 23a-3p, 25-3p when compared to samples without an AR mutation (AR-, n = 96) ( Figure 2L).

Identification of Cancer Type Specific Biomarkers
A search algorithm for the most predictive cfDNA mutations, miRNAs, and cfDNA methylation markers for each cancer type was derived and implemented, and variables with redundant information were eliminated based on a score that discourages false positives. The correlations for each cancer type are shown in Supplementary Figure S2A-I. Depending on the cancer type, three to four biomarkers per cancer type showed the highest correlations compared to the healthy control (Table 2).

Classification of Tumor Samples
Firstly, samples were split into two groups (χ 2 14.688, p < 0.001): samples with an AR mutation (AR+, n = 101) and samples without an AR mutation (AR-, n = 111). The AR+ group consisted only of tumor samples since no AR mutation was detected in the control group. However, the AR-group contained the healthy controls (n = 15) and tumor samples (n = 96); therefore, no further classification of these groups was possible based only on AR mutation. In order to separate healthy from tumor samples in the ARgroup, several discriminant function analyses with a leave-one-out cross-validation were carried out, including different sets of biomarkers, not including AR mutation. The sets of biomarkers were as follows: discriminant analysis 1 (DA1) incorporated all measured targets; discriminant analysis 2 (DA2) only cfDNA mutations; discriminant analysis 3 (DA3) only cfDNA methylation; discriminant analysis 4 (DA4) only miRNAs; discriminant analysis 5 (DA5) included the biomarkers with highest correlations identified through the correlation matrixes (Table 3). The DA5 model yielded the best results ( Figure 3) and classified healthy and tumor samples with 95.4% accuracy, 97.9% sensitivity, 80% specificity, and receiver operating characteristic area under the curve (ROC AUC) of 0.884. The DA5 model yielded the best results ( Figure 3) and classified healthy and tumor samples with 95.4% accuracy, 97.9% sensitivity, 80% specificity, and receiver operating characteristic area under the curve (ROC AUC) of 0.884.

Discussion
This study presents a liquid biopsy-based multi-analyte classification model for tumor samples and healthy controls. The AR p.H875Y mutation plays a key role in this model. Androgen receptor alterations have been identified as some of the main drivers of castration-resistant prostate cancer [25]. The AR p.H875Y mutation has been predominantly found in prostate cancer [26], but this mutation has also been reported for breast cancer [27] and CRC [28]. However, to our knowledge, this is the first time that AR p.H875Y mutation has been reported for bladder, lung, stomach, ovarian, brain, and pancreas cancer. AR mutations have been predominantly studied in connection to prostate and breast cancer, especially treatment response [29,30]. We analyzed all predefined targets in all samples, not only the genes reported to be relevant in the specific cancer type. Considering this, we speculate that there is no literature concerning other tumors until now because other studies that analyzed this specific AR mutation focused primarily on breast and prostate cancer. Besides, we used a qPCR-based method to detect cfDNA mutations, which is shown to have a better sensitivity to detect low allele fraction variants than sequencing [31]. Still, the underlying mechanisms of the involvement of AR p.H875Y

Discussion
This study presents a liquid biopsy-based multi-analyte classification model for tumor samples and healthy controls. The AR p.H875Y mutation plays a key role in this model. Androgen receptor alterations have been identified as some of the main drivers of castrationresistant prostate cancer [25]. The AR p.H875Y mutation has been predominantly found in prostate cancer [26], but this mutation has also been reported for breast cancer [27] and CRC [28]. However, to our knowledge, this is the first time that AR p.H875Y mutation has been reported for bladder, lung, stomach, ovarian, brain, and pancreas cancer. AR mutations have been predominantly studied in connection to prostate and breast cancer, especially treatment response [29,30]. We analyzed all predefined targets in all samples, not only the genes reported to be relevant in the specific cancer type. Considering this, we speculate that there is no literature concerning other tumors until now because other studies that analyzed this specific AR mutation focused primarily on breast and prostate cancer. Besides, we used a qPCR-based method to detect cfDNA mutations, which is shown to have a better sensitivity to detect low allele fraction variants than sequencing [31]. Still, the underlying mechanisms of the involvement of AR p.H875Y mutation in the carcinogenesis of these cancer types should be investigated. Nevertheless, our results suggest that AR p.H875Y mutation could be a promising biomarker for discriminating healthy subjects from cancer patients, especially CRC, bladder, and prostate.
Here, we describe a model consisting of two steps, sorting samples in two groups-with and without an AR mutation (AR+ and AR-respectively) and consequently classifying the AR-group in cancer patients and healthy subjects (95.4% accuracy, 97.9% sensitivity, 80% specificity, 0.884 ROC AUC). The classification models, based solely on mutations, cfDNA methylation, or miRNAs, showed poor specificity (DA2 26.7%, DA3 33.3%, and DA4 33.3%, respectively). Combining all the analyzed biomarkers improved the specificity to some extent (DA1 57.3%); however, the sensitivity declined. The large number of biomarkers included in the DA1 model decreases the classifier's performance since some contain redundant and superficial information. To alleviate the effects of this so-called "curse of dimensionality," also known as the "Hughes phenomenon" [32,33], the number of biomarkers included in the model should be decreased. Hence, biomarkers selection was carried out, and a classification model was performed based on the most relevant biomarkers (DA5), displaying the best results (Table 3, Figure 3).
Interestingly, our results demonstrate that the combination of three different analytes could improve the performance of a classification model. Each analyte type provides distinct information and adds value to the classification model, highlighting the importance of a multi-analyte-based liquid biopsy test for cancer detection [3,34].
Although cfDNA concentration has been previously suggested as a biomarker for cancer detection [35], our results did not support these findings. The healthy subjects in this study exhibited a higher amount of cfDNA plasma concentrations than patients with stage I tumors which has been already reported [36]. Plasma cfDNA present in healthy subjects is not unusual; however, the main contributor of cfDNA is the apoptosis of hematopoietic cells [37]. The cfDNA profile of a cancer patient differs from a healthy individual, whereby it consists of fragments originating from tumor cells, also called circulating tumor DNA (ctDNA). Additionally, these DNA fragments have a specific footprint indicating the tissue of origin [38]. Although we did not estimate the percentage of ctDNA of the total plasma cfDNA, we detected a higher mutational burden in cancer patients compared to healthy individuals.
Nonetheless, some genomic aberrations were detected among the control group (Supplementary Table S3). Somatic mutations in healthy tissues have been previously reported [39]. Since no follow-up of the participants was conducted, we cannot know if the healthy subjects harboring these somatic mutations developed cancer. However, these subjects were declared cancer-free at sample collection.
Our results suggest that plasma cfDNA could serve rather as a monitor for disease progression since cfDNA concentration correlated with cancer stage regardless of cancer type. However, mutational burden and miRNA expression and cfDNA methylation should be taken into account [40].
As mentioned earlier, one key limitation of this study is the lack of follow-up in addition to the small number of healthy controls. Despite the small size of the control group, it sufficed to observe several statistically significant results since it was matched to reflect the average sample size of each cancer group. Another potential limitation is the clinical utility of this model, as the three different analytes require separate sample processing and analysis. Nevertheless, a blood-based test is a minimally invasive procedure in contrast to a tissue biopsy and is more frequently accepted by patients than other screening procedures such as colonoscopy [41] or a fecal immunochemical test (FIT) [6] in the case of CRC. Thus, the screening methods themselves directly affect compliance and should be therefore optimized.
Liquid biopsy markers such as cfDNA mutations, cfDNA methylation, and CTCs are already successfully applied as prognostic and predictive tools for treatment response in several tumor types and monitoring tools for disease progression [42][43][44][45], as well as for disease screening [5]. Yet, there are still no clinically approved tests for a broader cancer screening of the population. Despite the limitations of this study, our results indicate that pan-cancer detection could be achieved through the combination of genetic and epigenetic biomarkers in plasma.

Conclusions
In this study, we created a liquid biopsy-based classification model allowing the discrimination between healthy controls and patients with various solid tumors. We demonstrated that combining several analytes improves the performance of the test. Nevertheless, a bigger prospective cohort is required to confirm the clinical utility of this classification model and assess whether a subclassification of the different cancer types is possible.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/cancers14020462/s1, Table S1: Assays for the mutation analysis of cell-free DNA, Table S2: Primer sequences used for methylation analysis of cell-free DNA, Table S3: Assays used for miRNA analysis, Table S4: Mutation frequencies in all samples, Figure S1: Correlation matrix of all variables, Figure S2: Correlation plots for each cancer type, Figure S3: Heatmap of the cell-free DNA methylation, Figure S4: Heatmap of the miRNAs levels.