Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
AMIA Jt Summits Transl Sci Proc. 2010 Mar 1;2010:26-30.

Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS.

Author information

  • 1Department of Biomedical Informatics, Columbia University.

Abstract

We describe a corpus-based approach to creating a semantic lexicon using UMLS knowledge sources. We extracted 10,000 sentences from the eligibility criteria sections of clinical trial summaries contained in ClinicalTrials.gov. The UMLS Metathesaurus and SPECIALIST Lexical Tools were used to extract and normalize UMLS recognizable terms. When annotated with Semantic Network types, the corpus had a lexical ambiguity of 1.57 (=total types for unique lexemes / total unique lexemes) and a word occurrence ambiguity of 1.96 (=total type occurrences / total word occurrences). A set of semantic preference rules was developed and applied to completely eliminate ambiguity in semantic type assignment. The lexicon covered 95.95% UMLS-recognizable terms in our corpus. A total of 20 UMLS semantic types, representing about 17% of all the distinct semantic types assigned to corpus lexemes, covered about 80% of the vocabulary of our corpus.

PMID:
21347142
[PubMed]
PMCID:
PMC3041551
Free PMC Article

Images from this publication.See all images (1)Free text

Figure 1:
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Icon for PubMed Central
    Loading ...
    Write to the Help Desk