Application of Principal Component Analysis Approach to Predict Shear Strength of Reinforced Concrete Beams with Stirrups

The reinforced concrete (RC) member’s shear strength estimation has been experimentally studied in most cases due to its nonlinear behavior. Many empirical equations have been derived from the experimental data; however, even those adopted in the construction codes do not thoroughly and accurately describe their shear behavior. Theoretically explained equations, on the other hand, are aligned with the experiment; however, they are complicated to use in practice. As shear behavior research is data-driven, the machine learning technique is applicable. Herein, an artificial neural network (ANN) algorithm is trained with 776 experiment results collected from available publications. The raw data is preprocessed by principal component analysis (PCA) before the application of the ANN technique. The predictions of the trained algorithm using ANN with PCA are compared to those of formulae adopted in a few existing building codes. Finally, a parametric study is conducted, and the significance of each variable to the strength of RC members is analyzed.


Introduction
Unlike the bending and compressive behaviors of reinforced concrete (RC), their shear behavior has yet to be theoretically well described. Typical equations that explain an RC member's bending and compression behaviors begin from decent assumptions and arrive at their final expressions through mathematical derivation. Meanwhile, the equations that describe the shear behavior still predominantly depend on statistical analysis and regression based on laboratory experiments.
For such empirical equations to better predict the behavior of the RC elements, collecting more data and considering a more comprehensive range of input parameters is preferable and can provide more generality. Such prerequisites make research on the shear behaviors of RC members an excellent subject in which to apply machine learning (ML) techniques, as a more extensive data set is typically more beneficial for training a learning algorithm.
The empirical approach has a potential drawback. In contrast to the analytically derived equations, where the dependency on specific variables is apparent, researchers do not know whether a particular variable is a control parameter or not. For this reason, it is considered safer to keep more variables as potential control parameters when experimenting; however, such practice inevitably increases the size of the dataset. The bigger size of the dataset, as a consequence, slows the training speed of a learning algorithm and sometimes even harms the prediction accuracy. This compensation is known as the curse of dimensionality. To overcome such a curse, raw data is preprocessed before being fed into the learning algorithm. Among many candidates of preprocessing techniques, this article specifically focuses on the method called principal component analysis (PCA). It seeks the applicability of PCA to the raw dataset as a means of preprocessing.
PCA can reduce the size of the dataset. PCA can reduce the number of input variables (or, in ML terminology, the features) through statistical analysis of the relationships between the features in the dataset. Although there are many available features in the dataset, not all of them equally affect the behavior of the RC member (or, in ML terminology, the label). When PCA preprocesses a dataset, it returns a set of mutually independent linear combinations of the existing features in the dataset. These processed features are called the principal components.
There can be a couple of strategies once we obtain the principal components. One can select a few components that describe the correlation in the dataset better than other components. On the other hand, one can choose all principal components without benefiting from dimension reduction but only taking advantage of orthogonalization. If a few principal components are selected, the training speed is expected to be faster since the dimension is reduced. Suppose all of the principal components are selected. Although we may not notice any speedup in training, the quality of the result is expected to be better due to the orthogonalization of the input dataset.
Since the early 2000s, researchers in the structural engineering field have begun to adopt ML techniques into their research, particularly to predict the shear behaviors of RC members better. The main learning algorithm considered has been a feed-forward backpropagation artificial neural network (FFBP ANN), where the input variables are selected according to the characteristics of the experiments. Sanad and Saka [1] trained a conventional multilayer FFBP ANN to predict the ultimate shear strength of reinforced concrete deep beams using 111 points of experimental data with ten variables. Their findings showed that ANN returned a reliable prediction even when no single equation existed that accurately describes the ultimate shear strength of RC deep beam. Adhikary and Mutsuyoshi [2] conducted a study to predict the shear strength of a concrete beam that was reinforced by externally bonded steel plates using traditional ANN. In their research, 469 and 85 finite element analysis results were used in total for the training set and test set, respectively. Sucharda [3] conducted a detailed analysis of a reinforced-concrete beam without shear reinforcement, which was based on a complex set of laboratory tests and non-linear analyses with a sensitivity study. Cladera and Marí [4,5] published two separate studies on the shear design procedure of an RC beam: RC beams reinforced with and without the vertical reinforcements, as well as with the traditional FFBP ANN. In total, 123 experimental data points were used in their research on the RC beam with stirrup, with the data set divided into 104 training data and 19 validation data. Mansour et al. [6] suggested an algorithm which predicts the shear strengths of RC beams with 176 experimental data where there are nine input variables. They verified that the prediction by ANN resulted in less error than the empirical equations suggested in the contemporary design codes. Oreta [7] examined the size effect of RC beams without stirrups to their shear strength using a multilayer FFBP ANN. In total, 118 experimental data with five input variables were used, and only five nodes were presented in each of the hidden layers. The trained model could successfully simulate the effects of input variables on the shear capacity of RC beams without stirrups. Lehmann [8] and Wang [9] conducted studies on the shear capacity of reinforced concrete members using fibers. Adhikary and Mutsuyoshi [10], again, trained two ANN models to predict the shear strength of steel fiber RC beams. The numbers of experimental specimens used for the training set and validation set were 70 and 15, respectively. There were four to five nodes in each of the hidden layers, and the depths of the layers were three in both cases. Abdalla, Elsanosi, and Abdelwahab [11] also used the multilayer FFBP ANN technique to predict the shear strength of RC beams. The data set comprised 164 experimental observations, where each observation consisted of six input variables. Yang, Ashour, and Song [12] used 631 experimental data to train a multilayer FFBP ANN with the early stopping technique to predict the shear capacity of deep beams, and 549 observations to do the same for slender beams. The prediction by the trained model was compared to the ACI 318-05 and EC2 specifications, and a case study was also conducted regarding the input variables. Perera et al. [13] used 98 experimental data to train an ANN model for the prediction of the FRP-strengthened RC beams' ultimate shear strength. 11 input variables were considered, including the variables that represent the effect of FRP, and the ANN model consisted of a single hidden layer with 11 nodes. They also verified the input variable with the greatest effect on the shear strength of FRP-reinforced RC beams. A study by Tanarslan, Secer, and Kumanlioglu [14] used the experimental data of 84 FRP-strengthened RC beams to estimate their shear strength. Nine input variables were chosen to incorporate the effects of concrete, steel reinforcement, and FRP. There was only one hidden layer consisting of four nodes. The trained model was compared to the current design code.
Although there have been many previous studies, the number of experimental data used for training and validation has rarely exceeded 500, and none of them has preprocessed the raw data using PCA to boost the training efficiency and prediction accuracy. The research introduced in this article uses a data set consisting of 776 shear experimental results and secured data from the precedent publications [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30]. Although it is not determined how many data points should be utilized to develop an ANN model, the regression results showed a significantly low error percentage, which implies that a sufficient number of data points was used in this study. Eight additional validation sets then validated the trained algorithm, and a parametric study was conducted afterward.
The study on the shear behavior of RC members is data-dependent, and the effectiveness and correlations between the control parameters of the experiment are not foreseeable. Based on such character, this article proposes an ANN approach with 776 PCA preprocessed experiment data [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30] to predict the shear behavior of the RC beams with transverse reinforcements. The trained algorithm was then validated using eight additional points of experiment data. Upon completion of the validation, the prediction given by the trained model was compared to the existing building codes: ACI 318-19 [31] and EC2-04 [32], as well as the model proposed by Lee and Kim [33], to determine whether the suggested approach in this article is worthwhile.

ACI 318-19
The design equation for the shear strength of reinforced concrete beams specified in ACI 318-19 [31] is as follows: In this equation, V u is the factored shear force at a section, V n is the nominal shear strength, and φ is the strength reduction factor. V n in Equation (1) can be decomposed into the contributions of concrete and shear reinforcements as follows: For non-prestressed members the concrete shear contribution, V c , shall be taken either of the following. If otherwise where b w is width of the web, d is effective depth of the section, ρ w is ratio of longitudinal tensile reinforcement, f c is compressive strength of concrete, N u is axial load, A g is gross cross-sectional area of the member, λ is modification factor for lightweight concrete, λ s is size effect factor. The size effect factor, λ s , is introduced by assuming that the shear strength of members with shear reinforcement less than the minimum required by code, does not increase in proportion to the member depth. This factor can be calculated using the following equation, The shear reinforcement contribution is based on the 45 • truss model, using the following equation, where A v represents the area of shear reinforcement, f yt specifies the yield strength of shear reinforcement, and α is the smaller angle between shear reinforcements and the longitudinal reinforcement. V s in Equation (6) is derived from equilibrium in the 45 degrees truss model. The 45 degrees truss model assumes: The 45 degrees truss model provides a simplified model that helps to easily calculate the shear strength of the beam; however, the model tends to underestimate the shear strength since it fixes the crack angle at 45 degrees.

EC2-04
EC2-04 [32] provides a shear strength equation based on the variable angle truss model, and it requires the design shear strength (V d ) at the ultimate limit state to be greater than the required shear strength (V u ). Here, V d can be one of V Rd,c , V Rd,s and V Rd,max , where each are defined as follows.
V Rd,c is the design shear strength of the RC beam without vertical reinforcement, as computed using the following formula: γ c in this equation is the strength reduction factor of concrete materials, k is the size effect coefficient, b w and d are the width and effective depth of the beam, where the unit is in millimeters. ρ l is the longitudinal reinforcement ratio and ρ cp is the compressive stress of the cross-section.
When V u is greater than V Rd,c , a sufficient amount of transverse reinforcement must be installed, and the corresponding shear capacity of the beam is denoted as V Rd,s and should be computed as follows.
Equation (8) must not exceed V Rd,max , which is calculated as ν is the effective coefficient of concrete compressive strength and z is 0.9d. EC2-04 [32] neglects the contribution of concrete to the shear strength when shear reinforcement exists.

Lee and Kim
The classical shear design equations have been improved by compatibility-aided truss models, such as the modified compression field theory (MCFT), rotating angle softened truss model (RA-STM), and fixed angle softened truss model (FA-STM). These sophisticated theories can predict the shear strengths of the beams more accurately; however, the calculations become more complex as they come to involve additional variables and conditional equations. Nevertheless, calculating the yield shear strength, which is slightly smaller than the ultimate shear strength, is relatively simple. Based on this fact, Lee and Kim [33] suggested a simplified but more accurate equation combining the yield shear strength and strain compatibility condition.
The equation considers the ratio of longitudinal reinforcement, shear span, and the moment at the critical section, and is expressed as follows: where ρ t is the ratio of transverse reinforcement, f yt is the yield stress of transverse reinforcement, λ is the modification factor reflecting the reduced mechanical properties of lightweight concrete (0.75 for the light weight, 1.0 for the normal weight), λ s is the size factor ( 200/d 3/2 ), f 1 is the principal tensile stress of concrete, ρ is the longitudinal tensile reinforcement ratio, k is the modification factor reflecting the effective principal compressive strain of concrete β s 2 1 + ρ t f yt /β s f c 0.9 , a v is the shear span, d is the effective depth of the cross-section, b w is the web width, and jd is the flexural lever arm.

Principal Component Analysis
Instead of directly using the experimental data to train the machine learning algorithm, it is preprocessed beforehand so as to yield faster training and better accuracy. The method used is called principal component analysis (PCA).
Let a set of experimental data X b be a two-dimensional array consisting of m columns and n rows. Typically, not all of the columns are mutually independent, as they are instead correlated to one another to some degree. Such correlation within the data set is known to have a drawback in terms of generalization when it is used to train the machine learning algorithms; in other words, the trained algorithm only predicts well within the data set used for training, and makes poor predictions with new data [34,35]. In addition, highly correlated data lead to the data redundancy problem [12], since they are similar to each other, not all of them are essential, therefore they are not all necessary to be presented to the learning algorithm in the training step. In order to see the dependencies between the columns and ultimately de-correlate the data, PCA can be used.
PCA is an orthogonal projection of m-dimensional observation on to l-dimensional subspace in such a way that the variance of the projection is maximized [36]. In order to achieve the dimension reduction, one should choose l < m; however, if m is already small, one may only seek to de-correlate the features in X b with l = m.
The typical PCA procedure on a data set X b ∈ R n×m is as follows. First, each column in X b , i.e., x b j ∈ R n , j = 1, · · · , m must be normalized so that the data have a mean of 0 and a standard deviation of 1.
Here, x a j ∈ R n is the normalized column vectors; mean x b j is the mean of x b j , which is a scalar; 1 ∈ R n is the vector where all entries are 1; and σ x b j is the standard deviation of x b j , which is also a scalar. The new matrix X a = x a 1 · · · x a m possesses identical information to X b but it is normalized column-wise.
Next, the correlation R a is obtained by As R a is a square matrix, eigenvalue decomposition is possible, which brings about m eigenvalues and corresponding eigenvectors: Here, V a is a matrix that has eigenvectors of R a as its column vectors, and Λ a is a diagonal matrix having eigenvalues for its diagonal entries. The algebraic and geometric multiplicities are not strictly considered. The eigenvalues are the variance of the corresponding eigenvector, and multiplying V a by X a gives the principal components of X b .
The size of X and X b are the same; however, all the columns are principal components, not the input variables, and the significance of each principal component is represented by the corresponding eigenvalue.
One may use all the principal components for the training. Such an approach would not reduce the dimension of the data set; however, each column in the preprocessed data set X is now de-correlated so that the learning algorithm can return a better prediction with a shorter training time. On the other hand, one could choose several significant principal components to approximately represent the raw data X b and reduce the dimension of the input data set without losing too much information. Since the data set X b only has eight input variables, all eight principal components were used to train the learning algorithm.
The first three variables, b w , d, and a/d, represent the section property of the RC beam, while the concrete material property is represented by the concrete compressive strength ( f c ), and the reinforcement properties are stored in the last four variables, namely f y , ρ l , f yt , and ρ t . The spacing and cross-sectional area of reinforcement are implicitly reflected in the two ρ values.
Performing PCA on this data set yielded the following Scree plot for each principal component, shown in Figure 1. Since there are only eight principal components, all eight are selected in this research to benefit only the de-correlation by PCA and not reduce the dimension. The coefficients of the principal components, i.e., V a in Equation (16), are presented in Table 1.

ANN Model
The architecture of FFBP ANN used in this research in shown in Figure 2. The preprocessing by PCA is also drawn as a layer of the architecture. The numbers of nodes in the raw data layer and the principal components layer are eight in both cases, since we are using all principal components after preprocessing. There is a single hidden layer, and the number of nodes in the hidden layer is set to 43, which turned out to predict the shear strength with minimal root mean square error (RMSE).
In order to assess the prediction by ANN, three additional measures were calculated: relative absolute error (RAE), root relative square error (RRSE), and the correlation coefficient of the prediction to the experimental measurements.
where n is the amount of data in the data set,ŷ i is the prediction, y i is the value in the data set, and y i is the mean ofŷ i . To construct and compare the errors, the model was validated using a 10-fold crossvalidation technique. The training data was arbitrarily divided into ten groups (one validation set and nine remaining training sets). The final ANN model utilized all the training data. Eight RC test points were excluded from the training set to spare for performance evaluation of the finalized ANN model. The accuracy of the predictions of the trained algorithm along with those of the shear strength equations suggested in the three different design codes introduced in Section 2 are measured using the above-mentioned errors; Equations (17)- (19); and the correlation coefficient, and these are summarized in Table 2. As can be seen in Table 2, the RMSE of ANN with PCA is 57.0 percent lower than that of ANN without PCA. Additionally, the average of RAE and RRSE of ANN with and without PCA was 7.1 percent and 16.7 percent, respectively. The prediction by ANN turned out to be the most accurate, followed in order by the results of Lee and Kim [33], then ACI 318-19 [31], and EC2-04 [32] was the least accurate.  Figure 3 shows comparison plots between the actual test results and the predictions by the four different methods mentioned above. The x-axis represents the normalized actual test results while the y-axis represents the normalized prediction by the corresponding method. The red solid line stands for the line y = x. When the prediction is more accurate, data points are gathered nearer the red line. If the dots tend to reside below the red line, the actual shear strength is stronger than the prediction, and we can conclude that the prediction is conservative. ACI 318-19 [31] (Figure 3a) falls into this case, where the dots are mostly distributed below the red line, therefore ACI 318-19 [31] guides the designers to conservatively design the shear capacity of RC beams. EC2-04 [32] (Figure 3b) also underestimated the shear strength of the given RC beam; however, the trend slightly overshoots when the shear strength exceeds 5 MPa. This is because EC2-04 [32] uses a different formula depending on the shear force that is expected to develop at the critical sections (Equations (7) and (8)). These two design codes are intended to have higher experimental results than the design equations to conservatively design the shear strength and provide simplified equations for the designers. The equation presented by Lee and Kim [33] (Figure 3c) has the dots closer to the red line than the former two. It neither underestimates nor overestimates the shear strength. Finally, the prediction by the ANN model trained with the PCA preprocessed data set has the dots packed very close to the red line, indicating that the prediction is very accurate, as well as the most accurate among the four methods.
It must be noted here that there is a philosophical difference between the equations in the design codes and the equations for the analysis purpose. ACI 318-19 [31] and EC2-04 [32] are design codes, and their purpose is to provide a reliable lower bound that the designers are safe to follow. Since the values are the lower bound, they are born to be conservative and underestimate the real shear capacity of the beams that they are supposed to be assessing. In the meantime, the equations for the analysis are developed to reveal the actual capacity of the beam. Their purpose is to estimate the shear capacity of the member under investigation accurately and as is. The result of the analysis equations is, therefore, not biased by nature. It is evident when we see the experiment data points in Figure 3a,b as the school of blue dots (data points) mostly reside below the solid red line (underestimation of the design code), whereas Figure 3c,d have the blue dots almost equally distributed above and below the solid red line. However, we must also notice that although the analysis equations show no apparent bias in the estimation, the prediction by ANN has less variance than that of another analysis purpose equation by Lee and Kim [33] (Figure 3c). The variance of ANN is also less than two designed codes (Figure 3a,b) from which we can conclude that the prediction by ANN is more accurate than the others regardless of the development purpose.

Validation Set
The test data of the total eight reinforced concrete beams under monotonic loading was additionally collected aside from the training set to use for the validation set. In Table 3, details of the specimens are shown with the experiment results. This experiment kept the dimensions of the cross-section, concrete strength, yield strength, and the reinforcement ratio of longitudinal reinforcement as constant (control variables) throughout the specimens. The test variables were the yield strength and the transverse reinforcement ratio. The authors must note here that the control and test variables are the purpose of the collected experimental data and have nothing to do with the object or intention of this ANN model's validation. The training set and this validation set are all amalgamated from the reachable sources by the authors. The current set was selected as a validation set among the data sets where the size of the set is small. In addition, the yield strength and the reinforcement ratio of the transverse bar are known to be the most effective factors that affect the shear capacity of the beam, the current set is more than eligible as a validation set. Moreover, the training set was divided into ten small subsets, and the ANN model was iteratively trained by using nine subsets and validated by one leftover subset to more effectively utilize the training set. The ultimate shear strengths of the specimens were calculated using two design codes of ACI 318-19 [31] and EC2-04 [32], the proposed equation by Lee and Kim [33], and the ANN model. Figure 4 shows the validation results of these four methods. The x-axis represents the transverse reinforcement ratio, while the y-axis represents the ratio of measured shear to calculated shear. In the figure, the dashed horizontal line indicates y = 1.0, where the measured value and the predicted value are identical to each other. Different markers are used to differentiate the yield strength of transverse reinforcement in the different test settings. The circular, triangular, and cross markers in the figure indicate the yield strengths of 451, 532, and 656 MPa, respectively. For numerical comparisons, the mean and coefficient of variation are calculated and presented on the corresponding plot. If the equations or the ANN model predict the shear strength of the RC beam correctly, the markers will be near the dashed line and form a horizontal line. In the cases of ACI 318-19 [31], EC2-04 [32], and Lee and Kim [33], the shear strength ratios tend to decrease as the transverse reinforcement ratio increases, while this trend is not observed for the ANN model. This implies that the ANN considers the effect of the ratio and yield strength of transverse reinforcement, while the other three do not. ACI318-19 [31] tends to underestimate the shear capacity of the beams compared to those of the other design methods, as mentioned earlier in this section. By contrast, EC2-04 [32] overestimated the shear strength. EC2-04 [32] calculates the crack angle assuming that the concrete and reinforcement reach a plastic state when the reinforced concrete member reaches failure. In the case of plasticity theory, the shear strength can be calculated relatively simply by using only the force equilibrium condition. However, the actual shear strength can be overestimated since it is assumed that the material reaches the plastic state when the member failed. The shear strength was well predicted by Lee and Kim [33] in terms of the mean values. However, the coefficients of variation of ACI 318-19 [31], EC2-04 [32], and Lee and Kim [33] are greater than those of ANN. The coefficient of variation of ANN is calculated to be 3.71%, while they are calculated to be 10.65%, 12.99%, and 7.25% for ACI318-19 [31], EC2-04 [32], and Lee and Kim [33], respectively. According to the results of the validation, we conclude that the ANN predicts the shear strength of the RC beams most accurately.

Parametric Study
Based on the validation data set, a parametric study was conducted on the various input variables that could affect the RC beam's shear strength. The concrete beams used in the parametric study had the same cross-section and material details as those of specimen 1 listed in Table 3. In order to analyze the effect of individual variables on the shear strength, all of the other variables were fixed while the variable being analyzed was varied along the range of interest; the result is plotted in Figure 5.  The x-axes in Figure 5 are the parameters being analyzed while the y-axis is V/(b w d), that is, the shear stress developed in the beam. The blue dots represent the data set used to train the ANN model while the pentagram is specimen 1 in the validation set. The yellow dotted curve is the trained ANN model with the preprocessed test data, and the solid purple, dashed green, and dash-dot light blue lines are the three shear strength equations suggested by ACI 318-19 [31], EC2-04 [32], and Lee and Kim [33], respectively.

Beam width and Effective Depth
This subsection discusses Figure 5a,b. The shear behavior of RC beams is affected by the size effect, and the main factors influencing the size effect are the effective depth of cross-section. In the case of ACI 318-19 and EC2-04, the shear stress according to the effective depth was constant because the size effect was not considered. Proposed equation by Lee and Kim reflects the size effect. Therefore, as the effective depth increases, the shear stress tends to decrease gradually. On the other hand, ANN tends to overestimate the shear stress when the effective depth is small. The size effect was not taken into account in the input variables of the machine learning algorithm. This is because the size effect is a dependent variable according to other independent variables, and too many input variables can complicate the algorithm. In order to optimize the algorithm for generalization and simplicity, only essential variables were considered.

Shear Span Ratio
The discrepancy between ANN and the other three equations is apparent in Figure 5c. ACI 318-19 [31] and EC2-04 [32] do not consider the effect of shear span ratio directly, and they appear to be flat lines again, while the experimental data indicates the shear strength is affected by the shear span ratio. The smaller the shear span to depth ratio, the larger the concrete compression zone formed between the support and the loading point, and the greater the shear resistance. On the other hand, when the shear span to depth ratio is greater than 2.5, the shear resistance gradually decreases as the shear span to depth ratio increases. ANN and Lee and Kim [33] predict such shear strength behavior, while the predictions of Lee and Kim [33] rapidly increase when the shear span ratio becomes smaller.

Concrete Compressive Strength
In Figure 5d, all four of the methods predicted that the shear strength would increase as the concrete shear strength increases. Among them, ACI 318-19 [31] and Lee and Kim [33] expected the increment to be linear, while EC2-04 [32] and ANN expected it to be bilinear. For EC2-04 [32], the shear strength was expected to increase until f c reached 40 MPa, while the bilinearity occurred at f c = 60 MPa for ANN.

Strength of Longitudinal Reinforcement
The strength of longitudinal reinforcement, f y in Figure 5e, turns out to not have a major effect on the shear strength of the RC beam. The experimental data show no significant tendency with respect to f y . The models other than ANN do not reveal the effect of f y either and appear to be a flat line, while ANN shows a slight sinusoidal-like curve. The variance of shear strength between f y equals 400 MPa and 600 MPa appears to be large, and this is presumably the effect of other variables.

Longitudinal Reinforcement Ratio
The longitudinal reinforcement (Figure 5f) is closely related to dowel action. Since the longitudinal reinforcement resists shear force acting in the vertical direction, when the amount of the longitudinal reinforcement increases, the shear resistance by dowel action increases. In addition, when the longitudinal reinforcement ratio is large, the width of the crack decreases, which increases the maximum value of the shear component. The experimental data also indicate that the shear strength will increase as the longitudinal reinforcement ratio increases. Both ANN and Lee and Kim [33] capture the effect of longitudinal reinforcement ratio, and the resulting curves behave almost identically. However, ACI 318-19 [31] and EC2-04 [32] did not show any variation per the change in longitudinal reinforcement ratio.

Strength of Transverse Reinforcement
In Figure 5g, the experimental data show higher shear strength for higher strength of transverse reinforcement, f yt , but the increment diminishes. ACI 318-19 [31] and Lee and Kim [33] predicted linearly increasing shear strength, while EC2-04 [32] and ANN predicted a diminishing increment. The ANN prediction even expected the shear strength to decrease in the region where f yt > 800 MPa. This prediction is realistic, since the shear failure is incurred for other reasons, such as concrete crushing, when high strength transverse reinforcement is used. Figure 5h shows how the transverse reinforcement ratio affects normalized shear strength. Shear strength is not only affected by the yield strength of shear reinforcement, but also by the shear reinforcement ratio. According to the analysis results, all four of the models expect the shear strength to increase as the transverse reinforcement ratio increases. Among the expectations, ACI 318-19 [31] expected the increment to be linear, while the other three expected the increment to be progressively flattened for a ratio greater than 0.6%.

Transverse Reinforcement Ratio
When the maximum rebar ratio is exceeded, the prediction result of ANN is close to the actual result because the failure mode may change for reasons such as concrete crushing.

Discussions and Limitations
As it is commonly applied to experiment-based studies, the prediction by ANN is neither safe nor accurate where the test data exist sparsely. For example, if the effective depth is less than 200mm, both test points and estimation by other equations tell that the shear capacity is less than 7, while the ANN model predicts its exponential growth (Figure 5b). Such a tendency is also observed in the shear span (Figure 5c). When the shear span is greater than 3.5, the ANN model significantly underestimates the shear capacity of the beam as opposed to the other analysis and design equations. Additionally, in Figure 5f, there are no data points to infer how the shear capacity might behave as the longitudinal reinforcement ratio exceeds 6%. Unlike the other three equations that extrapolate the function, the ANN model predicts a mysterious concavity in that region.
The prediction by ANN appears in the middle somewhere between EC2-04 [32] and ACI 318-19 [31] and flexibly leans toward the more accurate side as the training set directs to eventually yields lower error metrics. This characteristic can provide a more efficient and economical design direction. However, it does not mean that the prediction by ANN always supersedes and in preferable to the other equations. Particularly in a region where there is little experimental data, one should question whether the algorithm's estimation comports with common sense. Such a region typically falls into where the load condition is severe and deformation is extreme.

Conclusions
This article trained an artificial neural network (ANN) with a dataset consisting of 776 experimental data. The data were preprocessed by principal component analysis (PCA) for variable orthogonalization (de-correlation). The trained ANN predicts the shear strength of reinforced concrete (RC) beams with transverse reinforcement. When preprocessing with PCA, dimension reduction was not used in this research because only eight variables were in the dataset. The preprocessed data was used to train the neural network. The configuration of the neural network was determined through trial and error while numerous combinations of hidden layers and neurons in the layer were sought; the single hidden layer with 43 neurons returned the lowest root mean square error (RMSE).
The preprocessed data was used to train the neural network. The configuration of the neural network was determined through trial and error, while numerous combinations of hidden layers and neurons in the layer were sought; the single hidden layer with 43 neurons returned the lowest root mean square error (RMSE).
Eight additional experiments were conducted in order to obtain the validation set, and the prediction of the trained ANN model was compared to three different shear design equations: ACI 318-19 [31], EC2-04 [32], and the equation suggested by Lee and Kim [33]. The average ratios of experimentally obtained shear strength to shear strength obtained by equations and the ANN model were 0.89, 1.11, 0.76, and 0.93, for ANN, ACI 318-19 [31], EC2-04 [32], and Lee and Kim [33], respectively. The coefficients of variation were also calculated so as to validate the trained model and the shear design equations: 3.71%, 10.65%, 12.99%, and 7.25% corresponding to ANN, ACI 318-19 [31], EC2-04 [32], and Lee and Kim [33], respectively, were reported. For both measurements, ANN was the most accurate. ANN slightly overestimated the shear strength of the RC beam, while the coefficient of variation turned out to be significantly lower than those obtained by the other three methods.
A parametric study was also conducted in order to determine the effects of the input variables considered in this research. In this way, variables that were generally not considered in the shear design equations could be investigated, and their effectiveness to the shear strength could be studied. In addition, ANN reflected the effect of shear span ratio well while the other models did not. The shear strength tended to decrease when the shear span ratio increased from 2 to 3.5. The effect of concrete compressive strength was also observed. ACI 318-19 [31] and the equation by Lee and Kim [33] could not reflect this variable, while EC2-04 [32] and ANN reasonably incorporated it. The effect of longitudinal reinforcement ratio was clearly captured by ANN and Lee and Kim [33]. ANN and Lee and Kim [33] behaved nearly similarly for the varying longitudinal reinforcement ratio. The maximum yield strength of transverse reinforcement that should be used in the shear strength calculation is determined to not exceed 800 MPa. ANN predicted the shear strength to be decreased for f yt > 800 MPa, meaning that the vertical reinforcement of yield strength greater than 800 MPa negatively affected the RC beam shear strength.
Author Contributions: C.K. accumulated the experimental data, conducted analysis, and wrote the manuscript. S.K. provided assistance with the ANN model development and validation. D.S. reviewed the manuscript and provided useful technical assistance. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
All the research data used in this manuscript will be available whenever requested.

Conflicts of Interest:
The authors declare no conflict of interest.