Predicting the Mechanical Properties of RCA-Based Concrete Using Supervised Machine Learning Algorithms

Environment-friendly concrete is gaining popularity these days because it consumes less energy and causes less damage to the environment. Rapid increases in the population and demand for construction throughout the world lead to a significant deterioration or reduction in natural resources. Meanwhile, construction waste continues to grow at a high rate as older buildings are destroyed and demolished. As a result, the use of recycled materials may contribute to improving the quality of life and preventing environmental damage. Additionally, the application of recycled coarse aggregate (RCA) in concrete is essential for minimizing environmental issues. The compressive strength (CS) and splitting tensile strength (STS) of concrete containing RCA are predicted in this article using decision tree (DT) and AdaBoost machine learning (ML) techniques. A total of 344 data points with nine input variables (water, cement, fine aggregate, natural coarse aggregate, RCA, superplasticizers, water absorption of RCA and maximum size of RCA, density of RCA) were used to run the models. The data was validated using k-fold cross-validation and the coefficient correlation coefficient (R2), mean square error (MSE), mean absolute error (MAE), and root mean square error values (RMSE). However, the model’s performance was assessed using statistical checks. Additionally, sensitivity analysis was used to determine the impact of each variable on the forecasting of mechanical properties.


Introduction
Recently, the use of RA in concrete is gaining favour in the field of research, which gives not only environmentally friendly concrete but also shows satisfactory performance towards the mechanical properties of concrete [1,2]. In the previous decades, the production and utilization trend of sustainable concrete has been significantly increasing due to the high demand of the construction industries [3,4]. The production of concrete is now approximately 1t per human in a one-year period [5]. However, the considerable amount of concrete production fulfills the requirement of construction industries and negatively impacts the environmental conditions [6][7][8][9]. The concrete and aggregates production leads to the emission of carbon dioxide, CO 2 gas, dust, and other harmful gases, which ultimately results in environmental pollution [10][11][12]. The demand for waste concrete is also increasing because of natural disasters such as earthquakes around the world, leading to serious environmental problems [13][14][15][16]. RCA concrete is considered as one of the potential solutions to reduce the utilization rate of the resources produced naturally and uses the waste concrete appearing from natural disasters, also from the demolition of construction [17,18]. Although the utilization of RCA in concrete is limited due to low strength, low modulus of elasticity, and high deformation, the desired strength can be achieved by adopting the suitable mix design [19].
The applications of the RCA in concrete can significantly enhance the various properties of concrete by adopting smart techniques of adding other suitable materials to it. Recently, the modern approaches of ML for anticipating results in the field of civil engineering are gaining popularity worldwide. Normally, when it comes to forecasting concrete strength, it normally requires 28 days to achieve its desired strength. The different types of ML approaches may applied to forecast the different properties of concrete without consuming time and money. There are multiple types of ML approaches that are normally applied to forecast the required output such as DT, ANN, and GEP. De-Cheng et al. [20] applied an adaptive boosting approach for the anticipation of CS of concrete in which 1030 data bases were utilized to run the required model and reported 98% accuracy compared with the actual result. Dong et al. [21] used the ANN model for high-performance concrete, and they also used Monte Carlo simulation to forecast the behavior of high-strength concrete. Muhammad et al. [22] employed GEP to foretell the concrete's strength containing bagasse ash; the predicted accuracy was reported to be more than 80%, indicating better performance. Aliakbar et al. [23] indicated the new formulation for the mechanical properties of RA-based concrete with the help of GEP, and they also analyzed that the prediction level was close to the actual results. They investigated the CS, flexural strength, and STS from the retrieved data. Taihao et al. [24] represented their work on the application of ensemble ML techniques for the forecast and optimization of young's modulus, having RA concrete, the RF, and SVM employed on data for prediction, which shows the accurate prediction of the outcome.
The focus of this research is based on the prediction of two properties (STS and CS) of concrete containing RCA via supervised ML algorithms [25]. The performance of both models was analyzed and compared to evaluate the better performer for the prediction of results. The accuracy level between the real and anticipated output was observed from the coefficient correlation (R 2 ) value, and a higher value gives the impressive performance of the employed model. The AdaBoost technique was employed for optimization via producing 20 sub-models to obtain a higher R 2 value [26]. The application of these ML algorithms is to compare the predictive evaluation of each approach. The significance of this study is to determine the effect of the input factors used to anticipate the mechanical characteristics of concrete and the predictive accuracy of both methodologies. The research is innovative in that it uses the type of ML techniques and individual (DT) and ensemble (AdaBoost) ML algorithms to forecast the two outcomes (CS, STS) of recycled coarse aggregate concrete (RCA). The statistical application of checks was applied to analyze the nature of both techniques. In addition, the sensitivity analysis was also incorporated, which indicates the performance level of each input parameter for the anticipation of both STS and CS.

Methodology and Description of Data
The model's performance is based on the input variables and the number of databases used to run the model. The parameters used in this study for running the models to predict the CS and STS of RCA-based concrete were taken from the published literature and are available in Appendix A [27]. The anaconda navigator software was used in this research and incorporated Python coding to run the models for forecasting the results. The excel file with relevant input and output data was uploaded to the software, which runs the model as per the data available in the file. The outcome from the model was then imported for graphical representation. The running of the models comprised nine input parameters (cement, water, fine aggregate, natural CA, RCA, superplasticizers, maximum size of RCA, density of RCA, water absorption of RCA) and two output parameters (CS and STS). The relative frequency distribution of the nine variables can be seen in Figure 1. The relevant references regarding the application of various ML approaches are listed in Table 1. The descriptive statistical analysis for input parameters is illustrated in Table 2, indicating the various mathematical description and ranges of input parameters. In addition, the methodology of the research approach is presented via flowchart, as depicted in Figure 2, which represents the information of the stepwise adopted procedure of the study. The first phase indicates the information of the data obtained, and then the analysis took place using machine learning algorithms, while result explanation, comparison, and evaluation are presented in the next step of the flowchart.

Decision Tree Algorithm
The DT algorithm is a subset of the supervised machine learning (ML) technique known as individual supervised machine learning (ISML). It is applicable to classification and regression problems. This approach aims to generate a model that can forecast the targeted variable, for which it uses the representation of a tree to solve the problem. In machine learning, the classification process has two steps, the learning and forecasting steps. The learning step belongs to the development of the model based on the given data set, while, in the prediction step, said model is then used to foretell the response of the data. A decision tree is a well-known and effective classification technique that is simple to comprehend and apply. Sub-node creation improves the homogeneity of specific subnodes. There are several important terminologies associated with the decision tree. These include root nodes, which indicate the overall population of the sets; splitting, which refers to the process of dividing the nodes; decision nodes, which refers to the process of splitting sub-nodes into further sub-nodes; leaf nodes, which are the type of nodes that do not split; and pruning, which refers to the process of removing sub-nodes.

Decision Tree Algorithm
The DT algorithm is a subset of the supervised machine learning (ML) technique known as individual supervised machine learning (ISML). It is applicable to classification and regression problems. This approach aims to generate a model that can forecast the targeted variable, for which it uses the representation of a tree to solve the problem. In machine learning, the classification process has two steps, the learning and forecasting steps. The learning step belongs to the development of the model based on the given data set, while, in the prediction step, said model is then used to foretell the response of the data. A decision tree is a well-known and effective classification technique that is simple to comprehend and apply. Sub-node creation improves the homogeneity of specific sub-nodes. There are several important terminologies associated with the decision tree. These include root nodes, which indicate the overall population of the sets; splitting, which refers to the process of dividing the nodes; decision nodes, which refers to the process of splitting sub-nodes into further sub-nodes; leaf nodes, which are the type of nodes that do not split; and pruning, which refers to the process of removing sub-nodes.

AdaBoost Algorithm
The AdaBoost regressor is a supervised ML technique that uses an ensemble approach. It is also known as adaptive boosting because the weights are re-assigned to each instance, with greater weights going to instances that were mistakenly identified. Boosting techniques are commonly used in supervised learning to reduce bias and variation. These ensemble algorithms are used to improve the performance of the weak learner. During the training phase for the input data, it uses an endless number of decision trees. The recorded data that are incorrectly categorized throughout the initial model are given a high priority while developing the initial decision tree/model. These are the only data entries that are utilized as the input for a different model. The preceding technique will be repeated until the desired number of basic learners has been reached. When it comes to binary classification problems, the AdaBoost regressor outperforms the competition in terms of improving decision tree performance. It is also used to boost the efficiency of other machine learning methods. When used with a slow student, it is quite beneficial. The use of these ensemble methods is most common in civil engineering, especially when it comes to predicting the mechanical properties of different types of concrete.

Statistical Analysis
The result obtained from the statistical analyses indicated that the relationship between the actual and predicted outcomes (CS and STS) from the individual and ensemble ML algorithms, along with the distribution of errors, is explained as follows.

Compressive Strength Result Using Decision Tree
The relationship between the actual and predicted result of compressive strength for the decision tree algorithm can be seen in Figure 3a, along with the distribution of the errors shown in Figure 3b. The errors distribution for DT gives the maximum, minimum, and average values equal to 8.82 MPa, 0.58 MPa, and 3.58 MPa, respectively. However, 11.59% of the error data lie between 0 and 1 MPa, and 50.72% of the data lie between 2 MPa and 6 MPa. In addition, only 8.69% of the data lie above 7 MPa.
These ensemble algorithms are used to improve the performance of the weak learner. During the training phase for the input data, it uses an endless number of decision trees. The recorded data that are incorrectly categorized throughout the initial model are given a high priority while developing the initial decision tree/model. These are the only data entries that are utilized as the input for a different model. The preceding technique will be repeated until the desired number of basic learners has been reached. When it comes to binary classification problems, the AdaBoost regressor outperforms the competition in terms of improving decision tree performance. It is also used to boost the efficiency of other machine learning methods. When used with a slow student, it is quite beneficial. The use of these ensemble methods is most common in civil engineering, especially when it comes to predicting the mechanical properties of different types of concrete.

Statistical Analysis
The result obtained from the statistical analyses indicated that the relationship between the actual and predicted outcomes (CS and STS) from the individual and ensemble ML algorithms, along with the distribution of errors, is explained as follows.

Compressive Strength Result Using Decision Tree
The relationship between the actual and predicted result of compressive strength for the decision tree algorithm can be seen in Figure 3a, along with the distribution of the errors shown in Figure 3b. The errors distribution for DT gives the maximum, minimum, and average values equal to 8.82 MPa, 0.58 MPa, and 3.58 MPa, respectively. However, 11.59% of the error data lie between 0 and 1 MPa, and 50.72% of the data lie between 2 MPa and 6 MPa. In addition, only 8.69% of the data lie above 7 MPa.

Splitting Tensile Strength Result Using Decision Tree
The relation of the actual and predicted outcome of splitting tensile strength using the DT approach in depicted in Figure 4a along with its error distribution depicted in Figure 4b. The error distribution indicates the higher, lower, and average values equal to 2.47 MPa, 0, and 0.31 MPa, respectively. In contrast, 42.02% of the error data lie between 0 and 0.1 MPa, while 34.78% of the data lie between 0.1 MPa and 0.5 MPa. However, only 8.69% of the error data were reported as above 1 MPa.

Splitting Tensile Strength Result Using Decision Tree
The relation of the actual and predicted outcome of splitting tensile strength using the DT approach in depicted in Figure 4a along with its error distribution depicted in Figure 4b. The error distribution indicates the higher, lower, and average values equal to 2.47 MPa, 0, and 0.31 MPa, respectively. In contrast, 42.02% of the error data lie between 0 and 0.1 MPa, while 34.78% of the data lie between 0.1 MPa and 0.5 MPa. However, only 8.69% of the error data were reported as above 1 MPa.

Compressive Strength Result with AdaBoost Regressor
AdaBoost regressor gives strong relation between the real and anticipated output, as shown in Figure 5a, while the distribution of the error's value can be seen in Figure 5b. It shows the maximum, lower, and average values of the error data equal to 13 MPa, 0.06 MPa, and 2.33 MPa, respectively. Additionally, 26.08% of the error data were reported between 0 and 1 MPa, while 34.78% of the data lie between 2 MPa and 6 MPa. However,

Compressive Strength Result with AdaBoost Regressor
AdaBoost regressor gives strong relation between the real and anticipated output, as shown in Figure 5a, while the distribution of the error's value can be seen in Figure 5b. It shows the maximum, lower, and average values of the error data equal to 13 MPa, 0.06 MPa, and 2.33 MPa, respectively. Additionally, 26.08% of the error data were reported between 0 and 1 MPa, while 34.78% of the data lie between 2 MPa and 6 MPa. However, 4.34% of the error data were reported to be above 7 MPa.
(b) Figure 4. Numerical analyses representing the relationship between the predicted variables and targeted variables (a) along with their error distribution (b) for splitting tensile strength using DT.

Compressive Strength Result with AdaBoost Regressor
AdaBoost regressor gives strong relation between the real and anticipated output, as shown in Figure 5a, while the distribution of the error's value can be seen in Figure 5b. It shows the maximum, lower, and average values of the error data equal to 13 MPa, 0.06 MPa, and 2.33 MPa, respectively. Additionally, 26.08% of the error data were reported between 0 and 1 MPa, while 34.78% of the data lie between 2 MPa and 6 MPa. However, 4.34% of the error data were reported to be above 7 MPa. The statistical result of splitting tensile strength using the AdaBoost regressor also shows strong relations with less variance among the experimental results obtained from the model, as depicted in Figure 6a. The distribution of the errors obtained from the ap-

Splitting Tensile Strength with AdaBoost Regressor
The statistical result of splitting tensile strength using the AdaBoost regressor also shows strong relations with less variance among the experimental results obtained from the model, as depicted in Figure 6a. The distribution of the errors obtained from the application of the AdaBoost regressor can be seen in Figure 6b

K-Fold Cross-Validation and Statistical Checks
This process is normally adopted to check the authentic execution of the models. The authentic performance of the employed models is being verified from the k-fold crossvalidation process. In this method, the available data set is arranged randomly and split up into ten groups. A total of 60% of the dataset from total data points were used to train

K-Fold Cross-Validation and Statistical Checks
This process is normally adopted to check the authentic execution of the models. The authentic performance of the employed models is being verified from the k-fold crossvalidation process. In this method, the available data set is arranged randomly and split up into ten groups. A total of 60% of the dataset from total data points were used to train the model, 30% of the dataset were used to test the model, and 10% of the data were used for validation purposes. The process takes place in such a way that nine groups from ten are assigned for training the models, while the remaining one is for validation of the models. The said process was again repeated ten times to obtain the suitable average value. The K-fold cross-validation process also confirms the performance accuracy of the models. The statistical checks to confirm the accuracy level of the model's prediction were also employed using the equations illustrated below from (1)-(5) where, ex i = experimental value, mo i = predicted value, ex i = mean experimental value, mo i = mean predicted value obtained by the model, n = number of samples. As seen in Figures 7-10, the coefficient correlation (R 2 ), mean square error (MSE), mean absolute error (MAE), and root mean square error (RMSE) were used to evaluate the k-fold cross-validation of each employed model against its output. The variation was also noticed in the outcomes of both ML algorithms used (DT and AdaBoost). The lower the number of errors in the AdaBoost model, the higher the coefficient correlation (R 2 ) value, indicating a higher accuracy level than the decision tree. The information obtained from the analysis for both CS and STS used for k-fold cross-validation is listed in Tables 3 and 4, respectively.

Sensitivity Analyses
The input variables have a remarkable effect on the execution of the model's outcome. The sensitivity analyses were done to investigate the effect of each variable on the anticipation of both STS and CS, as depicted in Figure 5. The cement significantly contributed (36.8%) towards the prediction of CS, while other parameters contributed the least towards the forecasting of concrete CS containing RCA, as shown in Figure 11. However,  Additionally, the information of statistical checks in the form of MAE, MSE, and RMSE were assessed for both CS and STS and can be seen in the Tables 5 and 6, respectively. The lesser error shows a higher coefficient correlation value (R 2 ).

Sensitivity Analyses
The input variables have a remarkable effect on the execution of the model's outcome. The sensitivity analyses were done to investigate the effect of each variable on the anticipation of both STS and CS, as depicted in Figure 5. The cement significantly contributed (36.8%) towards the prediction of CS, while other parameters contributed the least towards the forecasting of concrete CS containing RCA, as shown in Figure 11. However, the contribution of parameters for predicting the STS can be seen in Figure 12. The significant contributions for the prediction of the STS of concrete were cement (41.2%) and natural coarse aggregate (NCA) (19%), while superplasticizers and RCA were the next highest contributors for the prediction of outcomes. The equations mentioned below were used to calculate the contribution of each parameter towards the model's outcome.
where-f min (x i ) and f max (x i ) are the lower and higher of the estimated output over the ith output.

Discussion
As demonstrated by the data, the ML-based strategy for forecasting the mechanical characteristics of concrete is clearly better than traditional mechanics-based methods. The advantages are as follows: (1) ML does not at all require complex mechanics/theoretical equations but instead finds the mapping between the input and output utilizing numerical and/or computer knowledge of science, making it very accessible to the readers; (2)

Discussion
As demonstrated by the data, the ML-based strategy for forecasting the mechanical characteristics of concrete is clearly better than traditional mechanics-based methods. The advantages are as follows: (1) ML does not at all require complex mechanics/theoretical equations but instead finds the mapping between the input and output utilizing numeri-

Discussion
As demonstrated by the data, the ML-based strategy for forecasting the mechanical characteristics of concrete is clearly better than traditional mechanics-based methods. The advantages are as follows: (1) ML does not at all require complex mechanics/theoretical equations but instead finds the mapping between the input and output utilizing numerical and/or computer knowledge of science, making it very accessible to the readers; (2) Unlike most empirical models, which typically consider a limited number of variables when deriving the formula, ML can consider an infinite number of variables; (3) Meanwhile, inherent uncertainties [55] can be incorporated into the training process; and (4) The precision, reliability, and robustness of machine learning-based models are significantly higher than those of traditional models: they can provide objective and accurate results in a matter of seconds.
The research approach of this study was to predict the mechanical properties (CS and STS) of concrete containing recycled coarse aggregates (RCA) via supervised machine learning algorithms. The anaconda navigator software was used to incorporate the Python coding for each employed machine learning algorithm. An excel file having a relevant database was used in the software, which allowed it to show the output results in the form of R 2 , MAE, MSE, and RMSE. The AdaBoost technique performs well, as proven by the coefficient correlation (R 2 ) value of 0.95 for CS prediction and 0.92 for STS prediction., Feng et al. [56] Additionally, AdaBoost was used to classify failure modes, yielding an accuracy of 0.96, and to determine the bearing capacity of reinforced concrete, yielding an R 2 value of 0.98. However, the value of R 2 for DT in predicting the CS was 0.93, and in forecasting, the STS was equal to 0.90. In comparison, Ahmad et al. [57] also employed DT to predict the CS of geopolymer concrete, which shows a reasonable and almost similar value of R 2 equal to 0.90 for its outcome. The higher value of R 2 (0.95 for compressive and 0.92 for STS) for AdaBoost indicates the high performance towards the prediction of the outcomes as compared to the R 2 value for DT (0.93 for compressive and 0.90 for STS). The lesser values of each error (MAE, MSE, RMSE) for AdaBoost also confirm the model's better accuracy level as opposed to the errors values for the DT. In addition, the sensitivity analysis describes the contribution level of each parameter used to run the model for predicting the mechanical properties of concrete containing recycled coarse aggregates. Cement and natural coarse aggregate (NCA) contributed significantly, up to 41.2% and 19%, respectively, while superplasticizers and RCA were the next highest contributors for the prediction of outcomes. It was noted that the accuracy level of the ensemble machine learning approach (AdaBoost) was higher than the individual machine learning technique (DT).

Conclusions and Future Recommendations
This research describes the application of both individual and ensemble ML algorithms to forecast the mechanical properties such as compressive strength (CS) and splitting tensile strength (STS) of concrete having recycled coarse aggregate (RCA). The decision tree (DT) and AdaBoost approaches were incorporated for prediction purposes. The input variables were analyzed by indicating their relative frequency distribution. The Python coding was used in the Spyder (Anaconda software) to run the required models for further investigation. The statistical checks in the form of various errors (MAE, RMSE, MSE) were evaluated to confirm the accuracy of each employed model. However, the k-fold cross-validation method was also included in the study for the confirmation of the model's accuracy. In addition, the contribution of each input variable was investigated via sensitivity analysis. The following conclusions and future recommendations can be drawn from the study.

•
The ensemble machine learning algorithm (AdaBoost) shows a better response with less variance towards the prediction of both the CS and splitting tensile strength of RCA-based concrete.

•
The AdaBoost regressor gives the values of coefficient correlation (R 2 ) for CS and STS of concrete equal to 0.95 and 0.92, respectively, as opposed to the values of R 2 for DT equal to 0.93 (CS) and 0.90 (STS).

•
The higher values of R 2 for the AdaBoost regressor towards the prediction of both CS and STS indicate the high accuracy of the model.

•
From the statistical checks, the lesser value of the errors (MAE, MSE, RMSE) also indicates high performance for the AdaBoost approach compared to the DT algorithm.

•
The K-fold cross-validation method also confirms the high accuracy level of the AdaBoost algorithm. • Sensitivity analysis reveals that the cement contributed effectively (32%) as compared to other parameters towards the forecasting of the CS of RCA-based concrete, while the superplasticizers were the higher contributor towards the prediction of the STS of concrete containing RCA.
In conclusion, this study was based on the application of supervised machine learning (ML) algorithms to foretell the two parameters (CS and STS) of concrete having recycled coarse aggregate (RCA). It also gives an idea of the importance of multiple aspects like the input variables, the number of data points for running the models, and the types of ML approaches to be used for high accuracy of the outcomes. The algorithms employed in this study show a strong relationship between the actual and predicted output. The importance of these approaches in civil engineering is indicated by their high accuracy level among the real and forecasted results. The supervised ML approaches are gaining more popularity, as their application gives high accuracy results/outcomes and minimizes the physical approach of the practical work and total cost of the project. Additionally, it is essential to incorporate laboratory work to compare machine learning approaches' findings to better understand their effectiveness. Additionally, the data points, type of material used, size of specimens, environmental conditions, curing conditions, loading rate, and increase in the input parameters can be modified or added to study and compare the results of various machine learning algorithms. Moreover, various ML techniques such as artificial neural networks (ANN), support vector machines (SVM), and boosting can be included to evaluate their performance.