Display Settings:

Format

Send to:

Choose Destination
We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
J Biomed Inform. 2011 Dec;44(6):927-35. doi: 10.1016/j.jbi.2011.06.001. Epub 2011 Jun 12.

Dynamic categorization of clinical research eligibility criteria by hierarchical clustering.

Author information

  • 1Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States.

Abstract

OBJECTIVE:

To semi-automatically induce semantic categories of eligibility criteria from text and to automatically classify eligibility criteria based on their semantic similarity.

DESIGN:

The UMLS semantic types and a set of previously developed semantic preference rules were utilized to create an unambiguous semantic feature representation to induce eligibility criteria categories through hierarchical clustering and to train supervised classifiers.

MEASUREMENTS:

We induced 27 categories and measured the prevalence of the categories in 27,278 eligibility criteria from 1578 clinical trials and compared the classification performance (i.e., precision, recall, and F1-score) between the UMLS-based feature representation and the "bag of words" feature representation among five common classifiers in Weka, including J48, Bayesian Network, Naïve Bayesian, Nearest Neighbor, and instance-based learning classifier.

RESULTS:

The UMLS semantic feature representation outperforms the "bag of words" feature representation in 89% of the criteria categories. Using the semantically induced categories, machine-learning classifiers required only 2000 instances to stabilize classification performance. The J48 classifier yielded the best F1-score and the Bayesian Network classifier achieved the best learning efficiency.

CONCLUSION:

The UMLS is an effective knowledge source and can enable an efficient feature representation for semi-automated semantic category induction and automatic categorization for clinical research eligibility criteria and possibly other clinical text.

Copyright © 2011 Elsevier Inc. All rights reserved.

PMID:
21689783
[PubMed - indexed for MEDLINE]
PMCID:
PMC3183114
Free PMC Article

Images from this publication.See all images (7)Free text

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Elsevier Science Icon for PubMed Central
    Loading ...
    Write to the Help Desk