Send to

Choose Destination
Stat Med. 2019 Jun 15;38(13):2477-2503. doi: 10.1002/sim.8103. Epub 2019 Jan 30.

Evaluating classification accuracy for modern learning approaches.

Li J1,2,3, Gao M4,5, D'Agostino R6.

Author information

Department of Statistics and Applied Probability, National University of Singapore, Singapore.
Duke University-NUS Graduate Medical School, Singapore.
Singapore Eye Research Institute, Singapore.
Department of Mathematics, Shanghai Jiao Tong University, Shanghai, China.
Department of Statistics, University of Michigan, Ann Arbor, Michigan.
Department of Mathematics and Statistics, Boston University, Boston, Massachusetts.


Deep learning neural network models such as multilayer perceptron (MLP) and convolutional neural network (CNN) are novel and attractive artificial intelligence computing tools. However, evaluation of the performance of these methods is not readily available for practitioners yet. We provide a tutorial for evaluating classification accuracy for various state-of-the-art learning approaches, including familiar shallow and deep learning methods. For qualitative response variables with more than two categories, many traditional accuracy measures such as sensitivity, specificity, and area under the receiver operating characteristic curve are not applicable and we have to consider their extensions properly. In this paper, a few important statistical concepts for multicategory classification accuracy are reviewed and their utilities for various learning algorithms are demonstrated with real medical examples. We offer problem-based R code to illustrate how to perform these statistical computations step by step. We expect that such analysis tools will become more familiar to practitioners and receive broader applications in biostatistics.


R package; convolutional neural net; deep learning; multilayer perceptron; mxnet; neural network


Supplemental Content

Loading ...
Support Center