A method for extracting task-oriented information from biological text sources

Int J Data Min Bioinform. 2015;12(4):387-99. doi: 10.1504/ijdmb.2015.070072.

Abstract

A method for information extraction which processes the unstructured data from document collection has been introduced. A dynamic programming technique adopted to find relevant genes from sequences which are longest and accurate is used for finding matching sequences and identifying effects of various factors. The proposed method could handle complex information sequences which give different meanings in different situations, eliminating irrelevant information. The text contents were pre-processed using a general-purpose method and were applied with entity tagging component. The bottom-up scanning of key-value pairs improves content finding to generate relevant sequences to the testing task. This paper highlights context-based extraction method for extracting food safety information, which is identified from articles, guideline documents and laboratory results. The graphical disease model verifies weak component through utilisation of development data set. This improves the accuracy of information retrieval in biological text analysis and reporting applications.

MeSH terms

  • Data Mining / methods*
  • Databases, Factual*
  • Food Safety*
  • Periodicals as Topic