Format

Send to

Choose Destination
J Am Med Inform Assoc. 2012 Sep-Oct;19(5):817-23. doi: 10.1136/amiajnl-2011-000752. Epub 2012 Apr 26.

Pneumonia identification using statistical feature selection.

Author information

1
Department of Biomedical and Health Informatics, School of Medicine, University of Washington, Seattle, Washington 98195-7240, USA. bejan@u.washington.edu

Abstract

OBJECTIVE:

This paper describes a natural language processing system for the task of pneumonia identification. Based on the information extracted from the narrative reports associated with a patient, the task is to identify whether or not the patient is positive for pneumonia.

DESIGN:

A binary classifier was employed to identify pneumonia from a dataset of multiple types of clinical notes created for 426 patients during their stay in the intensive care unit. For this purpose, three types of features were considered: (1) word n-grams, (2) Unified Medical Language System (UMLS) concepts, and (3) assertion values associated with pneumonia expressions. System performance was greatly increased by a feature selection approach which uses statistical significance testing to rank features based on their association with the two categories of pneumonia identification.

RESULTS:

Besides testing our system on the entire cohort of 426 patients (unrestricted dataset), we also used a smaller subset of 236 patients (restricted dataset). The performance of the system was compared with the results of a baseline previously proposed for these two datasets. The best results achieved by the system (85.71 and 81.67 F1-measure) are significantly better than the baseline results (50.70 and 49.10 F1-measure) on the restricted and unrestricted datasets, respectively.

CONCLUSION:

Using a statistical feature selection approach that allows the feature extractor to consider only the most informative features from the feature space significantly improves the performance over a baseline that uses all the features from the same feature space. Extracting the assertion value for pneumonia expressions further improves the system performance.

PMID:
22539080
PMCID:
PMC3422830
DOI:
10.1136/amiajnl-2011-000752
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center