Machine Learning-Based Approaches for Prediction of Patients’ Functional Outcome and Mortality after Spontaneous Intracerebral Hemorrhage

Spontaneous intracerebral hemorrhage (SICH) has been common in China with high morbidity and mortality rates. This study aims to develop a machine learning (ML)-based predictive model for the 90-day evaluation after SICH. We retrospectively reviewed 751 patients with SICH diagnosis and analyzed clinical, radiographic, and laboratory data. A modified Rankin scale (mRS) of 0–2 was defined as a favorable functional outcome, while an mRS of 3–6 was defined as an unfavorable functional outcome. We evaluated 90-day functional outcome and mortality to develop six ML-based predictive models and compared their efficacy with a traditional risk stratification scale, the intracerebral hemorrhage (ICH) score. The predictive performance was evaluated by the areas under the receiver operating characteristic curves (AUC). A total of 553 patients (73.6%) reached the functional outcome at the 3rd month, with the 90-day mortality rate of 10.2%. Logistic regression (LR) and logistic regression CV (LRCV) showed the best predictive performance for functional outcome (AUC = 0.890 and 0.887, respectively), and category boosting presented the best predictive performance for the mortality (AUC = 0.841). Therefore, ML might be of potential assistance in the prediction of the prognosis of SICH.


Introduction
Spontaneous intracerebral hemorrhage (SICH), which accounts for 10-30% of all strokes, is the most fatal and disabling type of hemorrhage [1][2][3]. China has one of the highest disease burdens of SICH in the world [1,4]. Because of the high disability and mortality rates of SICH, outcome-prediction models combining clinical presentations, laboratory data and imaging findings are of great significance and can ensure the optimal care [5]. Several prognostic tools have been proposed for outcome prediction in intracerebral hemorrhage (ICH) such as ICH score [6]. These tools are potentially useful for predicting prognosis, facilitating communication between clinicians, and selecting patients for interventions [7][8][9]. However, the predictive performance of the 90-day functional outcome and mortality of these tools remains unknown. Besides, the ICH score only consists of the Glasgow Coma Scale (GCS), ICH volume, age, location, and intraventricular extension of the hematoma [6]. Recent studies showed that some laboratory results, such as levels of monocytes and lymphocytes [10][11][12][13][14], offered potential predictive benefits to the outcome of SICH, suggesting 2 of 11 that a more accurate model could be made including more variables. Moreover, there is still no widely recognized tool for predicting the prognosis of Chinese SICH patients [15].
As a type of artificial intelligence, machine learning (ML) has several advantages in detecting the possible interactions among attributes and may be useful in the identification of prognostic markers. The key feature of ML is to allow computers to detect underlying patterns by iteratively learning from data, based on which a new model can be created, which prevents the influence from the researchers' intervention. In recent years, ML have been widely applied to the outcome prediction models for cerebrovascular diseases such as ischemic stroke [16,17], aneurysmal subarachnoid hemorrhage [18], and arteriovenous malformations [19]. However, ML-based outcome-prediction models for the SICH in Chinese patients are still rare. The aim of this study was to develop a prognostic model with ML methods to predict the functional outcome and mortality in Chinese patients with SICH according to the initial information on admission to hospital and to compare them with ICH score, the traditional risk stratification scale.

Study Population
We retrospectively reviewed SICH patients admitted to West China Hospital during a 2-year period, from 1 January 2018, to 31 December 2019. The diagnosis of SICH was confirmed by head computed tomography (CT) within the first 24 h after admission.
All continuous patients who were diagnosed with SICH during this period and were followed up for more than 3 months were included for further analysis. Extremely severe cases whose families refused any therapy after diagnosis were excluded in this study.

Data Collection
The study was conducted according to the guidelines of the Declaration of Helsinki and was approved by the Ethics Committee of West China Hospital (protocol code 1.1; 1 July 2017). The data used to develop the ML models were collected from the electronic medical records, including clinical, radiographic, and laboratory variables at the first evaluation. The demographic information, vital signs, radiographic findings, laboratory results, previous medical history, and treatments were collected. The first vital signs (body temperature [BT], heart rate [HR], and blood pressure) after hospital arrival were used. Length of time in the emergency room (ER) meant the period from when the patient first arrived ER to when the patients were transferred to the neurosurgery department or the operating room. The level of consciousness was assessed with GCS. Location of the hematoma (supratentorial, infratentorial, and both supra-and infratentorial), intraventricular hemorrhage (IVH), and the initial hematoma volume were evaluated by CT scan independently by two experienced doctors. The hematoma volume was measured using the ABC/2 method [20], in which A is the greatest diameter on the largest hemorrhage slice, B is the diameter perpendicular to A, and C is the approximate number of axial slices with hemorrhage multiplied by the slice thickness. Levels of complete blood count, blood glucose (BG), triglyceride, total cholesterol, high density lipoprotein cholesterol, low density lipoprotein cholesterol, creatinine, uric acid, sodium, chlorine, fibrinogen, and D-dimer were evaluated in the laboratory of our hospital. Estimated glomerular filtration rate (eGFR) was calculated based on the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation. The previous medical history, including hypertension, diabetes mellitus (DM), coronary heart disease, kidney diseases, and pulmonary diseases, was obtained by the patients' self-reports or the medical treatment they received.

In-Hospital Treatments and Outcomes
In-hospital treatments included conservative treatment or surgery (surgical hematoma evacuation). Generally, patients who had a supratentorial hematoma of ≥30 mL or infratentorial hematoma of ≥10 mL were recommended for surgery.
All patients were followed up for at least 3 months. The primary outcome was the functional disability at the 3rd month evaluated by the modified Rankin Scale ([mRS] from 0, no functional deficit, to 6, death). An mRS of 0-2 was defined as a favorable functional outcome, while an mRS of 3-6 was defined as an unfavorable functional outcome in this study. Survival at the 3rd month was evaluated as the secondary outcome.

Machine Learning ML Algorithms
Firstly, all candidate variables were tested with univariate analysis. Subsequently, recursive feature elimination with cross-validation (RFECV) was used to obtain the best feature combination for each model. RFECV included two parts: recursive features elimination (RFE) and cross-validation. Given an external estimator, RFE was used to select features by recursively considering increasingly small sets of features. For each ML algorithm, firstly, the estimator was trained on the initial set of features which contained all 41 variables, and the importance of each feature was obtained. Then, the least important feature was pruned from the current set of features. This procedure was recursively repeated on the pruned set until the optimal combination of features was got.
Six ML algorithms, which are efficient and widely used methods for the binary classification, were used in this study. Logistic regression (LR) and LRCV are most wildly used statistical models which in their basic form use a logistic function to model a binary dependent variable [21]. LR and LRCV are of high efficiency, especially for analogously linear datasets, and they are much faster in training models than other ML-based algorithms like support vector machine (SVM) and random forest (RF). SVM is one of the most robust prediction methods, being based on statistical learning frameworks or the Vapnik-Chervonenkis theory. It can efficiently perform not only a linear classification but also a non-linear classification using the kernel trick [22]. RF operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the RF is the class selected by most trees [23]. RF is usually flexible and easy to use in various conditions. Extreme gradient boosting (XGBoost) and category boosting (CatBoost) are typical and widely used ensemble learning algorithms. Ensemble methods use multiple learning algorithms to obtain a better predictive performance than that which could be obtained from any of the constituent learning algorithms alone [24].
In the current study, a five-fold cross-validation was used to build and assess the LR, LRCV, SVM, RF, XGBoost, and CatBoost models. All samples were divided into five approximately equally sized subsamples. Four subsamples were used as training data and the remaining one subsample was retained as the validation set for testing the models. The process was then repeated five times, with each of the five sub-samples used exactly once for validation. The five results from the repetition were then averaged to produce a final estimation. The area under the receiver operator characteristic curve (AUC) was used to evaluate the predictive performance of each model.

Comparison to the Intracerebral Hemorrhage (ICH) Score
The ICH score was calculated as described previously [6] based on GCS, ICH volume, IVH, location of the hematoma, and age. Its performance (AUC) was compared with the developed ML-based models using a pairwise t-test which was commonly used in the previous studies to assess the performance [25][26][27].

Statistical Analysis
All statistical analyses were performed in Python programming language, version 3.7 (Python Software Foundation). Qualitative data are described as the frequency and percentage. Fisher's exact test or Chi-square test were used to compare the categorical variables in subgroups. Quantitative data were first tested for normality by the D'Agostino-Pearson test. Normal data are expressed as the mean ± standard deviation (SD), while non-normal data are displayed as the median and interquartile range (IQR). Student's t-test was used for the comparison of normal variables, while the Wilcoxon test was used for the comparison of non-normal variables. The performance (AUC) of the different models was compared using the pairwise t-test. For all the statistical hypothesis, p values < 0.05 were considered significant.

Patient Characteristics
As shown in Figure 1, a total of 829 patients admitted with the diagnosis of SICH in our hospital during the 2-year period (from 1 January 2018, to 31 December 2019) were retrospectively reviewed. Seventy-eight patients were excluded because their family refused any further therapy after the diagnosis. The remaining 751 patients were further analyzed. The overall 90-day mortality was 10.2% (n = 76), while 553 patients (73.6%) presented favorable functional outcome at 90-day follow up. The cohort characteristics were presented in Table 1. The raw data supporting the conclusions of this article will be made available by the authors through contacting the corresponding author, without undue reservation. The predictive performance for the 90-day mortality was assessed by the similar method. As shown in Table 2 and Figure 3, CatBoost and LRCV provided the best predictive performance for the mortality outcome (AUC = 0.841 and 0.844, respectively). The AUCs of the other four models were as follows: LR, 0.837; XGBoost, 0.820; RF, 0.818; SVM, 0.777. As shown in Table 3, GCS, Age, D-dimer, and HR contributed largely to CatBoost, while AMC, location of the hematoma, and history of diabetes mellitus contributed significantly to LRCV.

Predictive Performance of the ML-Based Models
The intact algorithms for all the models with the optimal parameters were shown in the Supplementary Materials.
Among all the ML-based models, LR and LRCV showed the best predictive performance for the functional outcome at the 3rd month (AUC = 0.890 and 0.887, respectively, Table 2 and Figure 2), followed by CatBoost, XGBoost, RF, and SVM (AUC = 0.871, 0.864, 0.862, 0.849, respectively). In both LR and LRCV models, location of the hematoma, coagulation disorders, AMC, GCS, and intraventricular hemorrhage contributed materially to the models ( Table 3).
The predictive performance for the 90-day mortality was assessed by the similar method. As shown in Table 2 and Figure 3, CatBoost and LRCV provided the best predictive performance for the mortality outcome (AUC = 0.841 and 0.844, respectively). The AUCs of the other four models were as follows: LR, 0.837; XGBoost, 0.820; RF, 0.818; SVM, 0.777. As shown in Table 3, GCS, Age, D-dimer, and HR contributed largely to CatBoost, while AMC, location of the hematoma, and history of diabetes mellitus contributed significantly to LRCV.

Discussion
The prognosis prediction of SICH has long been dependent on the ICH score. Recent studies revealed the promising role of some laboratory results (such as levels of monocytes and lymphocytes) in the SICH outcome prediction. However, the ICH score, a traditional and widely-used prognostic predictive method, consists of the GCS, ICH volume, age, location, and intraventricular extension of the hematoma [6], without involvement of any laboratory results. In this study, we built distinctive ML-based models to develop a more accurate model involving multiple variables, in order to predict the 90-day functional outcome and mortality with better efficacy.
In this study, we developed 6 ML-based models for predicting the outcome of SICH. We analyzed the clinical characteristics, radiographic results, laboratory results, and previous medical history of 751 consecutive SICH patients by reviewing their medical records. The results showed that LR and LRCV were the most accurate models to predict the functional outcome with an AUC of 0.890 and 0.887, respectively, both of which were significantly better than that of ICH score. Besides, CatBoost and LRCV showed the best performance in the prediction of the 90-day mortality (AUC = 0.841 and 0.844, respectively), and they were also significantly more accurate than ICH score.
Both the univariate analysis and the feature importance analysis of the ML-based models illuminated that the level of the absolute monocyte cells provided a significant contribution to the prediction of both 90-day mortality and functional outcome. Higher levels of monocytes indicated a poor outcome of SICH. The recruitment of monocytes is a key feature of inflammation [28]. In 2016, Morotti et al. [10] illuminated that a higher level of monocyte on admission was directly associated with a higher risk of hematoma expansion, which might suggest a more unfavorable outcome. Indeed, many previous studies concluded that an elevated level of the monocyte was an independent risk factor for 30-day mortality in SICH patients, suggesting that monocyte level on admission might help predict the outcome of SICH [11][12][13][14], which was consistent with our study. Using ML technology, the monocyte level was proved to have significant predictive benefit of the 90-day outcome of SICH, which also suggested that additional knowledge could be obtained, benefiting from ML algorithms.
In clinical practice, the most widely used risk stratification scale for ICH, the ICH score, consists of GCS, ICH volume, age, location, and intraventricular extension of the hematoma [6]. The ICH score predicts the 30-day mortality after ICH. As our results displayed, some ML-based models performed significantly better than the ICH score in predicting both 90-day functional outcome and 90-day mortality. Overall, our results demonstrate how the data mining approach can be used as an alternative to the conventional approach, achieving comparable performance to well accepted prognostic models.
In this study, RFE was used to select the optimal combination of features by recursively considering smaller and smaller sets of features according to the importance, which enumerated almost all the combinations. Although this method is not so efficient, it is the best way to improve the performance of the model. Besides RFE, minimum redundancy maximum relevance (MrMr) and the Boruta algorithm are also efficient and widely-used methods for feature selection. According to Peng et al., MrMr can use either mutual information, correlation, or distance/similarity scores to select features. However, this algorithm may underestimate the importance of each of the seemingly insignificant variables with poor performance, which may turn significant when organized into ML-based models. Thus, MrMr is mostly used when variables are categorical. However, there are many quantitative variables in our datasets. Similar to MrMr, the Boruta algorithm optimizes the combination of variables by reducing the relevancy between the selected variables and increasing the relevancy between the variables and outcomes. Although these methods are more efficient in feature selection, RFE can provide a better performing model by enumeration.
This study has several clinical and methodological implications. Firstly, the factors which were previously neglected could be discovered. Together with these factors, the predictive performance could be improved using machine learning approaches. Secondly, the best model in the present study only contained a small number of variables. Thus, these models can be used easily in clinical practice to provide an accessible prediction of the outcome in SICH patients, which helps both the doctors and patients' families to choose the optimal management. Based on our studies, online websites were developed [http:// 114.251.235.51:1226/ich_recover_predict (accessed on 2 January 2022) for 90-day functional outcome; http://114.251.235.51:1226/ich_death_predict (accessed on 2 January 2022) for 90-day mortality]. Furthermore, our results eliminated that the predictive performance of the ML-based models remained high even when plenty of variables were input. Nowadays, since the electronic medical records are widely used, much larger datasets are needed to be manipulated in the future. ML algorithms are much more suitable to deal with the increasing number of variables than the traditional statistical methods.
However, our study had several limitations. First, some patients in critical conditions were not included in the study because of early withdrawal of care. Second, the sample size of our retrospective study may limit the improvement of the model performance. Third, the primary aim of this study was to predict the 90-day outcome of ICH patients based on the initial information on admission to hospital, thus serial changes of variables after admission were not considered. Moreover, external validation is lacking in the present study, which may restrict the generalizability of our results. Future studies with larger samples may help provide a higher predictive power.

Conclusions
In conclusion, the prediction of functional outcome and mortality after SICH is a challenge. Our findings suggested that the ML-based model is of high potential. The CatBoost and LRCV models are of good predictive performance for 90-day mortality with considerable accuracy, while the LRCV and LR models are of reliable predictive performance for 90-day functional outcome, all of which were better than ICH score, the traditional and widely-used risk stratification scale. These models might provide additional assistance in the prediction of functional outcome or mortality for SICH patients.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to data-safety restrictions.