Display Settings:

Format

Send to:

Choose Destination
    Database (Oxford). 2009;2009:bap019. Epub 2009 Nov 21.

    Integrating text mining into the MGI biocuration workflow.

    Source

    The Jackson Laboratory, Mouse Genome Informatics, 600 Main Street, Bar Harbor, ME 04609-1500, USA and University of Maine, Graduate School of Biomedical Sciences, Barrows Hall, Orono, ME 04469, USA.

    Abstract

    A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals.In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen approximately 1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database.Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi-automated processes into the curation of the biomedical literature.

    PMID:
    20157492
    [PubMed]
    PMCID:
    PMC2797454
    Free PMC Article

    Images from this publication.See all images (5) Free text

    Figure 1.
    Figure 3.
    Figure 5.
    Figure 2.
    Figure 4.

      Supplemental Content

      Icon for HighWire Press Icon for PubMed Central

      Save items

      loading

      Recent activity

      Your browsing activity is empty.

      Activity recording is turned off.

      Turn recording back on

      See more...
      Write to the Help Desk