Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
J Am Med Inform Assoc. 2012 Sep-Oct;19(5):867-74. doi: 10.1136/amiajnl-2011-000766. Epub 2012 Jun 16.

Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules.

Author information

  • 1Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota 55905, USA. siddhartha@mayo.edu

Abstract

OBJECTIVE:

This paper describes the coreference resolution system submitted by Mayo Clinic for the 2011 i2b2/VA/Cincinnati shared task Track 1C. The goal of the task was to construct a system that links the markables corresponding to the same entity.

MATERIALS AND METHODS:

The task organizers provided progress notes and discharge summaries that were annotated with the markables of treatment, problem, test, person, and pronoun. We used a multi-pass sieve algorithm that applies deterministic rules in the order of preciseness and simultaneously gathers information about the entities in the documents. Our system, MedCoref, also uses a state-of-the-art machine learning framework as an alternative to the final, rule-based pronoun resolution sieve.

RESULTS:

The best system that uses a multi-pass sieve has an overall score of 0.836 (average of B(3), MUC, Blanc, and CEAF F score) for the training set and 0.843 for the test set.

DISCUSSION:

A supervised machine learning system that typically uses a single function to find coreferents cannot accommodate irregularities encountered in data especially given the insufficient number of examples. On the other hand, a completely deterministic system could lead to a decrease in recall (sensitivity) when the rules are not exhaustive. The sieve-based framework allows one to combine reliable machine learning components with rules designed by experts.

CONCLUSION:

Using relatively simple rules, part-of-speech information, and semantic type properties, an effective coreference resolution system could be designed. The source code of the system described is available at https://sourceforge.net/projects/ohnlp/files/MedCoref.

PMID:
22707745
[PubMed - indexed for MEDLINE]
PMCID:
PMC3422831
Free PMC Article

Images from this publication.See all images (2)Free text

Figure 1
Figure 2
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Icon for HighWire Icon for PubMed Central
    Loading ...
    Write to the Help Desk