Format

Send to:

Choose Destination
See comment in PubMed Commons below
Health Care Manag Sci. 2013 Jun;16(2):119-28. doi: 10.1007/s10729-012-9216-9. Epub 2012 Nov 7.

Classifying highly imbalanced ICU data.

Author information

  • 1University of Pittsburgh, Pittsburgh, PA 15260, USA. yfr1@pitt.edu

Abstract

Highly imbalanced data sets are those where the class of interest is rare. In this paper, we compare the performance of several common data mining methods, logistic regression, discriminant analysis, Classification and Regression Tree (CART) models, C5, and Support Vector Machines (SVM) in predicting the discharge status (alive or deceased, with "deceased" being the class of interest) of patients from an Intensive Care Unit (ICU). Using a variety of misclassification cost ratio (MCR) values and using specificity, recall, precision, the F-measure, and confusion entropy (CEN) as criteria for evaluating each method's performance, C5 and SVM performed better than the other methods. At a MCR of 100, C5 had the highest recall and SVM the highest specificity and lowest CEN. We also used Hand's measure to compare the five methods. According to Hand's measure, logistic regression performed the best. This article makes several contributions. We show how the use of MCR for analyzing imbalanced medical data significantly improves the method's classification performance. We also found that the F-measure and precision did not improve as the MCR was increased.

PMID:
23132123
[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Springer
    Loading ...
    Write to the Help Desk