Display Settings:

Format

Send to:

Choose Destination

    BMC Bioinformatics. 2009 Oct 8;10:326.

    Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (CTD).

    Wiegers TC, Davis AP, Cohen KB, Hirschman L, Mattingly CJ.

    Department of Bioinformatics, The Mount Desert Island Biological Laboratory, Salisbury Cove, ME, USA. twiegers@mdibl.org

    BACKGROUND: The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, published literature. The goals of the research reported here were to establish a baseline analysis of current CTD curation, develop a text-mining prototype from readily available open source components, and evaluate its potential value in augmenting curation efficiency and increasing data coverage. RESULTS: Prototype text-mining applications were developed and evaluated using a CTD data set consisting of manually curated molecular interactions and relationships from 1,600 documents. Preliminary results indicated that the prototype found 80% of the gene, chemical, and disease terms appearing in curated interactions. These terms were used to re-rank documents for curation, resulting in increases in mean average precision (63% for the baseline vs. 73% for a rule-based re-ranking), and in the correlation coefficient of rank vs. number of curatable interactions per document (baseline 0.14 vs. 0.38 for the rule-based re-ranking). CONCLUSION: This text-mining project is unique in its integration of existing tools into a single workflow with direct application to CTD. We performed a baseline assessment of the inter-curator consistency and coverage in CTD, which allowed us to measure the potential of these integrated tools to improve prioritization of journal articles for manual curation. Our study presents a feasible and cost-effective approach for developing a text mining solution to enhance manual curation throughput and efficiency.

    PMID: 19814812 [PubMed - in process]

    PMCID: PMC2768719

    Supplemental Content

    Click here to read Click here to read