Send to

Choose Destination
Iran Red Crescent Med J. 2016 Aug 9;18(11):e32858. doi: 10.5812/ircmj.32858. eCollection 2016 Nov.

Prediction and Diagnosis of Non-Alcoholic Fatty Liver Disease (NAFLD) and Identification of Its Associated Factors Using the Classification Tree Method.

Author information

Department of Biostatistics, School of Medicine, Shiraz University of Medical Sciences, Shiraz, IR Iran.
Gastroenterohepatology Research Center, Shiraz University of Medical Sciences, Shiraz, IR Iran.



Non-alcoholic fatty liver disease (NAFLD) is the most common form of liver disease in many parts of the world.


The aim of the present study was to identify the most important factors influencing NAFLD using a classification tree (CT) to predict the probability of NAFLD.


This cross-sectional study was conducted in Kavar, a town in the south of Fars province, Iran. A total of 1,600 individuals were selected for the study via the stratified method and multiple-stage cluster random sampling. A total of 30 demographic and clinical variables were measured for each individual. Participants were divided into two datasets: testing and training. We used the training dataset (1,120 individuals) to build the CT and the testing dataset (480 individuals) to assess the CT. The CT was also used to estimate class and to predict fatty liver occurrence.


NAFLD was diagnosed in 22% of the individuals in the sample. Our findings revealed that the following variables, based on univariate analysis, had a significant association with NAFLD: marital status, history of hepatitis B vaccine, history of surgery, body mass index (BMI), waist-hip ratio (WHR), systolic blood pressure (SBP), diastolic blood pressure (DBP), high-density lipoprotein (HDL), triglycerides (TG), alanine aminotransferase (ALT), cholesterol (CHO0, aspartate aminotransferase (AST), glucose (GLU), albumin (AL), and age (P < 0.05). The main affecting variables for predicting NAFLD based on the CT and in order of importance were as follows: BMI, WHR, triglycerides, glucose, SBP, and alanine aminotransferase. The goodness of fit model based on the training and testing datasets were as follows: prediction accuracy (80%, 75%), sensitivity (74%, 73%), specificity (83%, 77%), and the area under the receiver operating characteristic (ROC) curve (78%, 75%), respectively.


The CT is a suitable and easy-to-interpret approach for decision-making and predicting NAFLD.


Classification Tree; Decision Tree; Non-Alcoholic Fatty Liver Disease; Prediction

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center