A Deep Neural Network for Early Detection and Prediction of Chronic Kidney Disease

Diabetes and high blood pressure are the primary causes of Chronic Kidney Disease (CKD). Glomerular Filtration Rate (GFR) and kidney damage markers are used by researchers around the world to identify CKD as a condition that leads to reduced renal function over time. A person with CKD has a higher chance of dying young. Doctors face a difficult task in diagnosing the different diseases linked to CKD at an early stage in order to prevent the disease. This research presents a novel deep learning model for the early detection and prediction of CKD. This research objectives to create a deep neural network and compare its performance to that of other contemporary machine learning techniques. In tests, the average of the associated features was used to replace all missing values in the database. After that, the neural network’s optimum parameters were fixed by establishing the parameters and running multiple trials. The foremost important features were selected by Recursive Feature Elimination (RFE). Hemoglobin, Specific Gravity, Serum Creatinine, Red Blood Cell Count, Albumin, Packed Cell Volume, and Hypertension were found as key features in the RFE. Selected features were passed to machine learning models for classification purposes. The proposed Deep neural model outperformed the other four classifiers (Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Logistic regression, Random Forest, and Naive Bayes classifier) by achieving 100% accuracy. The proposed approach could be a useful tool for nephrologists in detecting CKD.


Introduction
Chronic kidney disease is a disorder that occurs when a patient's kidney function deteriorates. As a result, their overall quality of life suffers. Chronic kidney disease affects one out of every 10 people worldwide (CKD). CKD is on the rise, and by 2040, it is expected to be the fifth leading cause of death worldwide [1]. It is one of the leading causes of high medical costs. In high-income nations, the cost of transplantation and dialysis accounts for 2% to 3% of the annual medical budget [2]. Most people with renal failure in low-and middle-income countries have insufficient access to life-saving dialysis and kidney transplants [3]. The number of kidney failure cases is expected to rise unexpectedly in developing countries such as China and India [4]. Chronic kidney failure makes to difficulties in removing extra fluids from the body blood. Advanced chronic kidney disease can cause dangerous levels of fluid, electrolytes, and wastes to build up in the body. It may lead to complications such as high blood pressure, anemia, weak bones, and nerve damage. The strongest indicator of renal function is the Glomerular Filtration Rate (GFR) [5]. Doctors also determine kidney disease through glomerular filtration rate (GFR). The criteria for defining CKD are a kidney damage for ≥3 months with or without decreased GFR or glomerular filtration rate (GFR) less than 60 mL/min/1.73 m 2 for ≥3 months with or learning and deep learning models may be used to successfully diagnose chronic kidney disease (CKD). Table 1 presents a detailed comparison of machine learning methods for the diagnosis of Chronic Diseases from the existing literature.
Z. Chen, X.et al. proved the reliability of multivariate models in clinical practice risk assessment for patients with CKD [30]. The Chronic Renal Failure (CFR) data bank at UC Irvine was used in this investigation. In their comparison investigation, they used the KNN, SVM, and Soft independent modeling of class analogy. In comparison to the other two models, the SVM model processed noise within the data set better. In this comparison, the SVM accuracy was 99%. The author of [31] developed a decision-making tool for doctors to forecast the occurrence of CRF in patients. The authors employed KNN, Naive Bayes, LDA, random subspace, and tree-based classification techniques on the CRF data set from the UCI repository. The random subspace with the KNN classifier has a 94% accuracy rate, according to the researchers. The authors of another study [32] created decision support similar to [31]. The authors classified CRF using Artificial Neural Networks (ANN), Naive Bayes, and decision tree algorithms in this paper. The performance of various machine learning algorithms was examined on Jordan's Prince Hamza Hospital data set. The decision tree is predicted the most accurate when compared to two other approaches. Song et al. [22] created a gradient boosting-based prediction model to detect CKD using diabetes patient's EHR and billing data. The authors of [33] published a study on UCI CKD data sets that used SVM, decision trees, Nave Bayes, and KNN to detect CKD. The authors developed a ranking algorithm to choose features. With a score of 99.75, the decision tree outperformed three alternative machine learning methods. The authors of [34] presented a hierarchical multiclass classification technique for detecting chronic renal disease in an unbalanced data set.
As a baseline, the authors used naive Bayes, logistic regression, decision trees, and random forests classifiers. Within each patient, the proposed classification approach discovered severe cases. A chronic renal disease diagnosis system was proposed in [35] to diagnose CKD at an early stage. For preparing the data, the authors used the K-means technique. On processed data, the KNN, SVM, and Naive Bayes classification algorithms were used. Classification algorithms produced the greatest accuracy of 97.8%. Almasoud and Ward [36] reported a study on CKD that used logistic regression, SVM, random forest, and gradient boosting techniques. Four categorization techniques were applied to selected features. Gradient boosting has the highest accuracy of 99%. E M Senan et al. [37] recommended a study on early-stage CKD diagnosis. The RFE method was used to select characteristics from the CKD data set. The outcomes of the SVM, KNN, random forest and decision tree algorithms were compared. Krishnamurthy S. et al. [38] developed various artificial intelligence models to predict Chronic Kidney Disease. The LightGBM model selected the most important features for CKD prediction: age, gout, diabetes mellitus, use of sulfonamides, and angiotensins. The convolutional neural networks achieved the best performance and the highest AUROC metric, 0.954, compared to other models. Mohamed Elhoseny et al. [19] presented an intelligent prediction system for Chronic Kidney Disease. The density-based Feature Selection method eliminates the irrelevant features and then passes selected features to the Ant Colony-based Optimization classifier to predict CKD. Singh and Jain [39] presented novel hybrid approach for diagnose CKD and achieved 92.5 % of prediction accuracy. An artificial neural network for CKD diagnosis was proposed by Neves et al. [25]. The diagnostic sensitivity values ranged from 93.1% to 94.9%, and the diagnostic specificity values ranged from 91.9% to 94.2% in this work.
Vasquez-Morales et al. [27] used large CKD data to generate a neural network classifier, and the model was 95% accurate. Makino et al. [28] used textual data to extract patients diagnoses and treatment information in order to forecast the course of diabetic kidney disease. Ren et al. [29] developed a predictive model for the identification of CKD from an Electronic Health Records (EHR) data set. This proposed model is based on a neural network framework that encodes and decodes the textual and numerical information  [41] devised a way for preventing CKD using machine learning. The SVM and ANN were among the machine learning classification algorithms used by the researchers. The results of the experiments revealed that ANN had a greater accuracy of 99.75% than SVM.
J. Qin et al. [42] presented a machine learning method for the early detection of CKD. They used logistic regression, random forest, SVM, naive Bayes classifier, KNN, and feedforward neural network to develop their models. The most accurate classification model was random forest, which had a 99.75% accuracy rate. Z. Segal et al. [43] presented a machine learning technique based on an ensemble tree (XGBoost) for the early diagnosis of renal illness. The presented model was compared against Random Forest, CatBoost, Regression with Regularization. The proposed model showed better performance in all matrices, including c-statistics 0.93, sensitivity 0.715, and specificity 0.958. Khamparia et al. [44] developed a deep learning model for early detection of CKD, in which features were selected from multimedia data using a stacked autoencoder model. The authors used A SoftMax classifier to predict the final class. It was observed that the proposed model achieved the highest performance in comparison to conventional classification techniques on the UCI CKD data set.
Polat, H. et al. [45] presented a study on the role of effective feature selection methods in the accurate prediction of CKD. In this paper, wrapper and filter feature selection approaches were used to select the dimension of the Chronic Kidney Disease data set. The selected features are then passed to Support Vector Machine to classify Chronic Kidney Disease for diagnosis purposes. The experimental results presented that Support Vector Machine generates better results on selected features by the Best First search method with filtered subset evaluator. SVM achieved an accuracy rate (98.5%) in comparison to features selected by other wrapper and filter methods. Ebiaredoh-Mienye Sarah A. et al. [46] presented a robust model for the prediction of CKD that integrates an enhanced sparse autoencoder (SAE) and Softmax regression. In this proposed model, autoencoders achieved sparsity by penalizing the weights. The Softmax regression was optimized for the classification task; therefore, the proposed model achieved excellent performance. The proposed model obtained an accuracy of 98% on the chronic kidney disease (CKD) data set. The proposed model achieved comparable performance with other existing methods. Zhiyong Pang et al. [47] proposed a fully automated computer-aided diagnosis system to classify malignant and benign masses using breast magnetic resonance imaging. The texture features were selected by integration of support vector machine with ReliefF feature selection method. This system achieved an accuracy of 92.3%. Chen et al. [21] presented a model in which Hepatitis was diagnosed with a hybrid method that integrates a Fisher discriminatory analysis algorithm and an SVM classifier. As a result of comparing the proposed method with the existing methods, the hybrid method outperforms the other methods, and the highest classification accuracy of 96.77% is achieved. The authors presented a breast cancer diagnosis model in this study [48]. The selected features by sequential forward selection and the backward selection methods are passed to Artificial Neural Networks to classify breast cancer. SBSP + NN achieved the highest accuracy of 98.75%.

Data Set Description
The University of California Irvine (UCI) Repository was used to gather CKD data. There are 400 patient records in the data set, and some values are missing. It comprises 24 clinical qualities that emerge in the prognosis of chronic kidney disease, with one class attribute serving as a result of the patient's presence of chronic renal failure being predicted. There are two types of values in the expected feature diagnostic: "ckd" and "notckd." The data set contains 250 values of the "ckd" class (62.5%) and 150 values of the "notckd" class (37.5%). The characteristics of the UCI CKD data collection are listed in Table 2.

Data Processing
The estimation of missing values and the removal of noise such as outliers, as well as the normalization and validation of unbalanced data, were all part of the preprocessing stages. When assessing a patient, some measurements could be missing or incomplete.

Handling Missing Values
There are 158 completed cases in the data collection, with the remainder missing. Ignoring records is the simplest technique to deal with missing values; however, this is not practical for small data sets. The data set is examined during the data preparation process to see whether any attribute values are missing. The missing values for numerical features were estimated using the statistical technique of mean imputation. The mode technique was used to replace the missing values of nominal features.

Categorical Data Encoding
Because most machine learning algorithms only accept numeric values as input, category values must be encoded into numerical values. The binary values "0" and "1" are used to represent the characteristics of categories such as "no" and "yes".

Data Transformation
Data transformation is the process of transforming numbers on the same scale so that one variable does not dominate the others. Otherwise, learning algorithms perceive larger values as higher and smaller values as lower, regardless of the unit of weight. Data transformations alter the values in a data set so that they can be processed further [49]. To improve the accuracy of machine learning models, this research employs a data normalization technique. It converts data between the −1 and +1 ranges. The converted data has a standard deviation of 1 and a mean of 0.
The standardization formula is given below:

Outlier Detection
Outliers are observation points that are isolated from the rest of the data. An outlier could be caused by measurement variability or signal an error in the experiment. An outlier can distort and mislead the learning process of the machine learning algorithm. It leads to longer training times, less accuracy in the model, and ultimately to poorer results. This paper uses the Interquartile Range (IQR) [49] based approach to remove outliers before transferring data to the learning algorithm.

Feature Selection
Recursive Feature Elimination (RFE) removes features recursively, building a model based on other features [50]. It applies greedy search to find the most efficient subset of features. Use model accuracy to determine which features are most appropriate for predicting a feature. It develops models iteratively, determining the best or worst feature for each iteration. The traits are then classified based on the sequence in which they were removed. If the data set contains N functions, recursive feature elimination will eagerly search for a combination of 2N features in the worst-case scenario.

Support Vector Machine
The SVM constructs a separation hyperplane that splits the labeled data into classes and determines whether a new data value belongs above or below the line. There may be several hyperplanes, and the one with the largest margin between data points is chosen. Figure 1 shows the maximum hyperplanes and maximum margin of the support vector machine. The equation of hyperplane that separates two classes is given by: Diagnostics 2022, 11, x FOR PEER REVIEW 7 of 23 each iteration. The traits are then classified based on the sequence in which they were removed. If the data set contains N functions, recursive feature elimination will eagerly search for a combination of 2N features in the worst-case scenario.

Support Vector Machine
The SVM constructs a separation hyperplane that splits the labeled data into classes and determines whether a new data value belongs above or below the line. There may be several hyperplanes, and the one with the largest margin between data points is chosen. Figure 1 shows the maximum hyperplanes and maximum margin of the support vector machine. The equation of hyperplane that separates two classes is given by: However, the equation of the maximum-margin hyperplane can be written Here, i is the support vector, and yi is the training instance a(i) class value. The learning algorithm determines the numeric value b and αi, respectively.

K-Nearest Neighbor
The KNN algorithm recognizes similarities between new and previous data points and categorizes fresh test points into existing related groups. The KNN method is a slow learning algorithm since it is not parametric. This means that instead of learning from the training data set, it should be secured. It uses K to categorize the data. The distance between the new location and the saved training point was determined using the Euclidean distance. Figure 2 depicts K-Nearest neighbor classification based on K values. However, the equation of the maximum-margin hyperplane can be written Here, i is the support vector, and y i is the training instance a(i) class value. The learning algorithm determines the numeric value b and α i, respectively.

K-Nearest Neighbor
The KNN algorithm recognizes similarities between new and previous data points and categorizes fresh test points into existing related groups. The KNN method is a slow learning algorithm since it is not parametric. This means that instead of learning from the training data set, it should be secured. It uses K to categorize the data. The distance between the new location and the saved training point was determined using the Euclidean distance. Figure 2 depicts K-Nearest neighbor classification based on K values.
Diagnostics 2022, 11, x FOR PEER REVIEW 8 of 23 KNN algorithm searches t training data set with minimum distance to the testing set.

Decision Tree Classifier
Decision trees are a nonparametric method of supervised learning [51]. This is a classified structured tree that defines the characteristics of a data set. It represents internal rules for decision-making through internal nodes and tree branches. It has two types of nodes, the decision, and the leaf nodes. The decision nodes take some decisions, and the outcomes of such decisions are leaf nodes. A decision tree has presented in Figure 3.

Random Forest Classifier
The random forest algorithm is based on ensemble learning, improving the model's performance, and solving complex problems by combining several classifiers. A classifier named after the algorithm that contains multiple decision trees averaged over a database subset to improve predictions. In the forecasting process, it does not rely on a single deci- KNN algorithm searches t training data set with minimum distance to the testing set.

Decision Tree Classifier
Decision trees are a nonparametric method of supervised learning [51]. This is a classified structured tree that defines the characteristics of a data set. It represents internal rules for decision-making through internal nodes and tree branches. It has two types of nodes, the decision, and the leaf nodes. The decision nodes take some decisions, and the outcomes of such decisions are leaf nodes. A decision tree has presented in Figure 3. KNN algorithm searches t training data set with minimum distance to the testing set.

Decision Tree Classifier
Decision trees are a nonparametric method of supervised learning [51]. This is a classified structured tree that defines the characteristics of a data set. It represents internal rules for decision-making through internal nodes and tree branches. It has two types of nodes, the decision, and the leaf nodes. The decision nodes take some decisions, and the outcomes of such decisions are leaf nodes. A decision tree has presented in Figure 3.

Random Forest Classifier
The random forest algorithm is based on ensemble learning, improving the model's performance, and solving complex problems by combining several classifiers. A classifier named after the algorithm that contains multiple decision trees averaged over a database subset to improve predictions. In the forecasting process, it does not rely on a single decision tree, and the random forest algorithm creates forecasts from each decision tree that The random forest algorithm is based on ensemble learning, improving the model's performance, and solving complex problems by combining several classifiers. A classifier named after the algorithm that contains multiple decision trees averaged over a database subset to improve predictions. In the forecasting process, it does not rely on a single decision tree, and the random forest algorithm creates forecasts from each decision tree that predicts the conclusion based on the majority of decision votes. The usage of several trees decreases the possibility of the model overfitting. To predict the classes in the database, the algorithm includes many decision trees, some of which can predict the proper outcome while others cannot. As a result, there are two assumptions regarding the prediction's accuracy. To forecast a more accurate outcome than an estimate, the algorithm must first include the actual value of the feature variable. Second, there must be an extremely low correlation between the forecasts for each tree. As a result, there are two requirements for high forecast accuracy. Figure 4 shows a Random Forest Classifier.
Diagnostics 2022, 11, x FOR PEER REVIEW decreases the possibility of the model overfitting. To predict the classes in the dat the algorithm includes many decision trees, some of which can predict the prope come while others cannot. As a result, there are two assumptions regarding the p tion's accuracy. To forecast a more accurate outcome than an estimate, the algorithm first include the actual value of the feature variable. Second, there must be an extr low correlation between the forecasts for each tree. As a result, there are two require for high forecast accuracy. Figure 4 shows a Random Forest Classifier.   This phase applied different methods such as handling missing values, categorical data encoding, data transformation, removing outliers and extreme values, and feature selection. The CKD data set is separated into training and testing data sets after being preprocessed. Only a few features are selected using Recursive Feature Elimination out of a total of 24 features in this study. The RFE algorithm evaluates each feature's value based on its significance, which helps to lower the method's processing complexity. Finally, redundant and unrelated characteristics are filtered away. The learning model is then fed with the most important characteristics. Figure 6 shows the pseudo-code for the proposed methodology. Initially, a method was introduced to prepare and standardize the data in the data set. The processed data is further passed for processing. This phase applied different methods such as handling missing values, categorical data encoding, data transformation, removing outliers and extreme values, and feature selection. The CKD data set is separated into training and testing data sets after being preprocessed. Only a few features are selected using Recursive Feature Elimination out of a total of 24 features in this study. The RFE algorithm evaluates each feature's value based on its significance, which helps to lower the method's processing complexity. Finally, redundant and unrelated characteristics are filtered away. The learning model is then fed with the most important characteristics. Figure 6 shows the pseudo-code for the proposed methodology. Initially, a method was introduced to prepare and standardize the data in the data set. The processed data is further passed for processing. There are 12 layers in the proposed model architecture: an input layer, five dense layers, five drop layers, and an output dense classifier layer. In Figure 7, the layered architecture's exact specifications are depicted. Each dense layer is connected directly in a feed-forward method in this architecture. The layer is built in such a way that the outputs of its activation maps are handed on to all following levels as input. A dropout layer is placed between two dense layers in this model, with drop rates of 0.5, 0.4, 0.3, 0.2, and 0.1. Figure 7 presents the layered architecture of the proposed model.

Model Development
The CNN model has several hyperparameters that need to be optimized. The optimal hyperparameters selection process is experimental; however, it is time-consuming and difficult. Adam [52,53] optimizer initiates hyperparameters with smaller parameters during the training phase.
Adam uses adaptive assessment to determine individual learning rates for various hyperparameter grades ranging from first to second-order gradients. Stochastic Gradient Optimization (SGD) [54] is less efficient than Adam. It necessitates minimal learning time and memory. The classification performance is enhanced by the CNN correct activation function. Neural network's standard activation functions are sigmoid, tan, Rectified Linear Unit (ReLU) [55], Exponential Linear Unit (ELU) [56], and Self-Normalized Linear Unit (SELU) [57]. This paper tested the different activation functions on the CKD data set and selected the preferred one in all the models. There are 12 layers in the proposed model architecture: an input layer, five dense layers, five drop layers, and an output dense classifier layer. In Figure 7, the layered architecture's exact specifications are depicted. Each dense layer is connected directly in a feed-forward method in this architecture. The layer is built in such a way that the outputs of its activation maps are handed on to all following levels as input. A dropout layer is placed between two dense layers in this model, with drop rates of 0.5, 0.4, 0.3, 0.2, and 0.1. Figure 7 presents the layered architecture of the proposed model.
The CNN model has several hyperparameters that need to be optimized. The optimal hyperparameters selection process is experimental; however, it is time-consuming and difficult. Adam [52,53] optimizer initiates hyperparameters with smaller parameters during the training phase.
Adam uses adaptive assessment to determine individual learning rates for various hyperparameter grades ranging from first to second-order gradients. Stochastic Gradient Optimization (SGD) [54] is less efficient than Adam. It necessitates minimal learning time and memory. The classification performance is enhanced by the CNN correct activation function. Neural network's standard activation functions are sigmoid, tan, Rectified Linear Unit (ReLU) [55], Exponential Linear Unit (ELU) [56], and Self-Normalized Linear Unit (SELU) [57]. This paper tested the different activation functions on the CKD data set and selected the preferred one in all the models.

Experiment Setup
The proposed model was created using data from a variety of situations. The configuration of the system of the developing model is shown in Table 3.

Resource Specification Processor
Intel Core i5 Gen7 Random access memory 16 GB Graphics processing unit 4 GB Language Python

Evaluation Parameters
The proposed model accuracy was calculated by making the CKD class value positive and the notCKD class value negative. The confusion matrix was utilized to evaluate the performance by using True Positives (TP), True Negatives (TN), False Positives (FP),

Experiment Setup
The proposed model was created using data from a variety of situations. The configuration of the system of the developing model is shown in Table 3.

Evaluation Parameters
The proposed model accuracy was calculated by making the CKD class value positive and the notCKD class value negative. The confusion matrix was utilized to evaluate the performance by using True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) [58]. According to TP, CKD samples have been accurately categorized. The findings of the FN test show that CKD samples were misclassified. The notCKD samples were not accurately identified, as indicated by a false-positive result (FP). True negative (TN) samples have been accurately categorized as not CKD.

Accuracy
It refers to the proportion of correct guesses to total predictions. Accuracy can be described as the ability to accurately predict the outcome of a situation.

Recall
The recall calculates the proportion of accurately predicted positive observations to the total number of observations in the class, as shown in the following equation.

Specificity
The specificity estimates the number of well-scored negative patterns. The higher the specificity value, the more negative the classifier. It can be defined as:

Precision
As stated in the equation below, this metric represents the proportion of accurately predicted positive observations to total predictive positive observations.

F-Measure
Precision and Recall are weighted averaged in the F-measure [58]. False positives and false negatives are part of the process. F-measure is a term that is defined as The F-Measure values lie from 0 to 1.

Comparative Analysis of Results
The findings of the proposed model are presented in this section. The CKD data sets are split into 75% training and 25% test data sets. The hyperparameter settings for the proposed model are shown in Table 4. The confusion matrices are shown in Figure 8. It demonstrates that the suggested model correctly identified all genuine positive and true negative events. The CKD class reports recall, precision, sensitivity, F1 score, and accuracy. Table 4. Hyper-parameter settings.

Activation Function relu
Activation output layer sigmoid

Optimizer Adam
Loss binary_crossentropy regression, KNN, SVM, Decision tree, and Random forest. No parameter adjustments were made for these algorithms to show the improved performance of the proposed model. Therefore, the default value for a parameter was used in scikit-learn. All models are evaluated using the F1-score. Tables 5 and 6 showed experimental results when the proposed model was tested on CKD data sets. In contrast, Figures 9 and 10 depict accuracy graphs comparing the performance of existing classification algorithms to the proposed approach for chronic kidney disease prediction. The accuracy of KNN, SVM, Naïve Bayes, Decision tree, logistic regression, and the proposed model is 92%, 92%, 95%, 97%, 99% and 100%, respectively. The proposed model was found to be the most accurate, with a 100% accuracy rate. Because it optimally identified positive samples as 250 samples (TP) and all 150 samples as negative samples, the suggested model appropriately classifies all positive and negative samples (TN). True Positive samples were graded 99%, 92%, 95%, 92%, and 97% by Logistic Regression, KNN, Nave Bayes, SVM, and Decision Tree, respectively, with a margin of error of 1%, 8%, 5%, 8%, and 3%, respectively. The results of all five classifiers are shown in Table 5. The proposed model outperforms the other classifiers by scoring 100% on all measures. The F1-score, accuracy, precision, and recall of the Logistic regression were all 99%, 99%, 100%, and 98%, respectively. Then Decision Tree obtained an F1-score, Accuracy, Precision, and Recall of 97%, 97%, 95%, and 100%, respectively. The Naïve Bayes F1score, Accuracy, Precision, and Recall values were 95%, 95%, 92%, and 100%, respectively. The F1-score, Accuracy, Precision, and Recall values of Naïve Bayes were 92%, 92%, 88%, The proposed model is compared with other classifier algorithms, including logistic regression, KNN, SVM, Decision tree, and Random forest. No parameter adjustments were made for these algorithms to show the improved performance of the proposed model. Therefore, the default value for a parameter was used in scikit-learn. All models are evaluated using the F1-score. Tables 5 and 6 showed experimental results when the proposed model was tested on CKD data sets. In contrast, and 98%, respectively. The Support Vector Machines classifier performed the lowest with F1-score, Accuracy, Precision, and Recall values of 92%, 92%, 87%, and 96%, respectively. Figure 9. Accuracy graphical representation for the UCI CKD data set. Table 6 compares the proposed model to several recent scholarly studies, such as Ant Colony-based Optimization Classifier by Elhoseny et al. [19], Neural network by Vasquez-Morales et al. [27], KNN by M Senan et al. [37], Convolutional Neural Networks by Krishnamurthy et al. [38], SVM by Polat, H. et al. [45], and SAE and Softmax Regression proposed by Sarah, A. et al. [46]. The proposed model has obtained an accuracy of 100%, while the exiting works obtained the accuracy from 85% to 98.5%. Finally, it should be noted that the proposed model is more efficient than existing classification methods.

Feature Importance from RFE
This section of the paper presents the most important feature selected by the RFE algorithm based on their ranking. The figure shows the chosen features and their importance during the classification of CKD disease. The most critical risk factors are Hemoglobin, Serum Creatinine, Specific Gravity, Packed Cell Volume, Red Blood Cell Count, Hypertension, and Albumin, as presented in Table 7. The nephrologists should focus on these risk factors while diagnosing CKD disease patients. Figure 11 shows feature selected by RFE with their importance. Polat, H et al. [45] Sarah A. et al. [46] Proposed Model Figure 10. Accuracy graphical representation for the UCI CKD data set.
The accuracy of KNN, SVM, Naïve Bayes, Decision tree, logistic regression, and the proposed model is 92%, 92%, 95%, 97%, 99% and 100%, respectively. The proposed model was found to be the most accurate, with a 100% accuracy rate. Because it optimally identified positive samples as 250 samples (TP) and all 150 samples as negative samples, the suggested model appropriately classifies all positive and negative samples (TN). True Positive samples were graded 99%, 92%, 95%, 92%, and 97% by Logistic Regression, KNN, Nave Bayes, SVM, and Decision Tree, respectively, with a margin of error of 1%, 8%, 5%, 8%, and 3%, respectively. The results of all five classifiers are shown in Table 5.

Feature Importance from RFE
This section of the paper presents the most important feature selected by the RFE algorithm based on their ranking. The figure shows the chosen features and their importance during the classification of CKD disease. The most critical risk factors are Hemoglobin, Serum Creatinine, Specific Gravity, Packed Cell Volume, Red Blood Cell Count, Hypertension, and Albumin, as presented in Table 7. The nephrologists should focus on these risk factors while diagnosing CKD disease patients. Figure 11 shows feature selected by RFE with their importance.

Feature Importance from RFE
This section of the paper presents the most important feature selected by the RFE algorithm based on their ranking. The figure shows the chosen features and their importance during the classification of CKD disease. The most critical risk factors are Hemoglobin, Serum Creatinine, Specific Gravity, Packed Cell Volume, Red Blood Cell Count, Hypertension, and Albumin, as presented in Table 7. The nephrologists should focus on these risk factors while diagnosing CKD disease patients. Figure 11 shows feature selected by RFE with their importance.  Proposed Model Figure 11. Important features selected by RFE.

Receiver Operating Characteristic (ROC)/Area under Curve (AUC)
The bottom of the square and the ROC curve define the area of the AUC. AUC scores closer to 1 indicate good performance, whereas AUC scores closer to 0.50 indicate poor performance. Figures 12-17 shows the ROC/AUC curve of the proposed model, logistic regression, Decision tree, SVM, KNN, and Naïve Bayes respectively. The proposed model achieved the highest AUC score value 1.0.

Receiver Operating Characteristic (ROC)/Area under Curve (AUC)
The bottom of the square and the ROC curve define the area of the AUC. AUC scores closer to 1 indicate good performance, whereas AUC scores closer to 0.50 indicate poor performance. Figures 12-17 shows the ROC/AUC curve of the proposed model, logistic regression, Decision tree, SVM, KNN, and Naïve Bayes respectively. The proposed model achieved the highest AUC score value 1.0.

Conclusions and Future Work
A deep learning model for the early diagnosis of chronic disease is presented in this work. In this research, the authors looked at the Recursive Feature Elimination approach to identify which features are the most important for prediction. The most essential CKD features are packed red blood cell count, albumin, cell volume, serum creatinine, specific

Conclusions and Future Work
A deep learning model for the early diagnosis of chronic disease is presented in this work. In this research, the authors looked at the Recursive Feature Elimination approach to identify which features are the most important for prediction. The most essential CKD features are packed red blood cell count, albumin, cell volume, serum creatinine, specific gravity, hemoglobin, and hypertension. Classification algorithms are fed with a set of features. Different metrics, including classification accuracy, recall, precision, and f-measure, are used to estimate the comparative analysis. The proposed deep neural model outperformed the other five classifiers (Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Logistic regression, Random Forest, and Naive Bayes classifier) by achieving 100% accuracy. The accuracy of KNN, SVM, Naïve Bayes, Decision tree, Random Forest, logistic regression is 92%, 92%, 95%, 97%, and 99%, respectively.
The performance of the proposed model compared with several recent scholarly studies, such as Ant Colony-based Optimization Classifier by Elhoseny et al. [19], Neural network by Vasquez-Morales et al. [27], KNN by M Senan et al. [37], Convolutional Neural Networks by Krishnamurthy et al. [38], SVM by Polat, H. et al. [45], and SAE and Softmax Regression proposed by Sarah A. et al. [46]. The exiting works obtained the accuracy from 85% to 98.5%, while the proposed model has obtained an accuracy of 100%. The proposed approach could be a useful tool for nephrologists in detecting CKD.
The limitation of the proposed model was that it had been tested on small data sets. To improve the model performance, significant volumes of increasingly sophisticated and representative CKD data will be collected in the future to detect disease severity. The clinical data to be collected from pathologist's experts. The performance of the proposed model will be evaluated on a large clinical data set based on acid-base parameters, hyperparathyroidism, inorganic phosphorus concentration, and night urination in the future. Additionally, new features will be applied to get a broader perspective on the informative parameters related to CKD disease to test the prediction accuracy.