Send to

Choose Destination
Int J Med Inform. 2018 Nov;119:22-38. doi: 10.1016/j.ijmedinf.2018.08.008. Epub 2018 Aug 28.

Identifying people at risk of developing type 2 diabetes: A comparison of predictive analytics techniques and predictor variables.

Author information

Department of Information Systems, Ansari College of Business, University of Nevada, Reno, USA; School of Software, Faculty of Engeering and IT, University of Technology Sydney, Australia. Electronic address:
Nevada Medical Intelligence Center, School of Community Health Sciences and Department of Pediatrics, University of Nevada Reno, USA. Electronic address:



The present study aims to identify the patients at risk of type 2 diabetes (T2D). There is a body of literature that uses machine learning classification algorithms to predict development of T2D among patients. The current study compares the performance of these classification algorithms to identify patients who are at risk of developing T2D in short, medium and long terms. In addition, the list of predictor variables important for prediction for T2D progression is provided.


This study uses 10,911 records generated in 36 clinics from the 15th of November 2008-15th of November 2016. Syntactic minority oversampling and random under sampling were used to create a balanced dataset. The performance of Neural Networks, Support Vector Machines, Decision Tress and Logistic Regression to identify patients developing T2D in short, medium and long terms was compared. The measures were Area Under Curve, Sensitivity, Specificity, Matthew correlation coefficient and Mean Calibration Error. Through importance analysis and information fusion techniques the predictors of developing T2D were identified for short, medium and long-term risk analysis.


The findings show that the performance of analytics techniques depends on both period and purpose of prediction whether the prediction is to identify people who will not develop T2D or to determine at risk patients. Oversampling as opposed to under sampling improved performance. 16 predictors and their importance to determine patients at risk of T2D in short, medium and long terms were identified.


This study provides guidelines for an automated system to prompt patients for screening. Several predictors are reportable by patients, others can be examined by physicians or ordered for further lab examination, which offers a potential reduction of the burden placed upon the clinical settings.


Classification Algorithms; Diabetes; Machine Learning

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center