Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
BMC Bioinformatics. 2007 Jan 25;8:24.

Benchmarking natural-language parsers for biological applications using dependency graphs.

Author information

  • 1School of Crystallography, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK. a.clegg@mail.cryst.bbk.ac.uk

Abstract

BACKGROUND:

Interest is growing in the application of syntactic parsers to natural language processing problems in biology, but assessing their performance is difficult because differences in linguistic convention can falsely appear to be errors. We present a method for evaluating their accuracy using an intermediate representation based on dependency graphs, in which the semantic relationships important in most information extraction tasks are closer to the surface. We also demonstrate how this method can be easily tailored to various application-driven criteria.

RESULTS:

Using the GENIA corpus as a gold standard, we tested four open-source parsers which have been used in bioinformatics projects. We first present overall performance measures, and test the two leading tools, the Charniak-Lease and Bikel parsers, on subtasks tailored to reflect the requirements of a system for extracting gene expression relationships. These two tools clearly outperform the other parsers in the evaluation, and achieve accuracy levels comparable to or exceeding native dependency parsers on similar tasks in previous biological evaluations.

CONCLUSION:

Evaluating using dependency graphs allows parsers to be tested easily on criteria chosen according to the semantics of particular biological applications, drawing attention to important mistakes and soaking up many insignificant differences that would otherwise be reported as errors. Generating high-accuracy dependency graphs from the output of phrase-structure parsers also provides access to the more detailed syntax trees that are used in several natural-language processing techniques.

PMID:
17254351
[PubMed - indexed for MEDLINE]
PMCID:
PMC1797812
Free PMC Article

Images from this publication.See all images (8)Free text

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Icon for BioMed Central Icon for PubMed Central
    Loading ...
    Write to the Help Desk