Send to

Choose Destination
See comment in PubMed Commons below
J Clin Monit. 1995 May;11(3):189-206.

Classification-algorithm evaluation: five performance measures based on confusion matrices.

Author information

  • 1Medical Department, Hewlett-Packard Laboratories, Palo Alto, CA 94303-0867, USA.



The objective of this paper is to introduce, explain, and extend methods for comparing the performance of classification algorithms using error tallies obtained on properly sized, populated, and labeled data sets.


Two distinct contexts of classification are defined, involving "objects-by-inspection" and "objects-by-segmentation." In the former context, the total number of objects to be classified is unambiguously and self-evidently defined. In the latter, there is troublesome ambiguity. All five of the measures of performance here considered are based on confusion matrices, tables of counts revealing the extent of an algorithm's "confusion" regarding the true classifications. A proper measure of classification-algorithm performance must meet four requirements. A proper measure should obey six additional constraints.


Four traditional measures of performance are critiqued in terms of the requirements and constraints. Each measure meets the requirements, but fails to obey at least one of the constraints. A nontraditional measure of algorithm performance, the normalized mutual information (NMI), is therefore introduced. Based on the NMI, methods for comparing algorithm performance using confusion matrices are devised.


The five performance measures lead to similar inferences when comparing a trio of QRS-detection algorithms using a large data set. The modified NMI is preferred, however, because it obeys each of the constraints and is the most conservative measure of performance.

[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Loading ...
    Write to the Help Desk