Clinic expert information extraction based on domain model and block importance model

Comput Biol Med. 2015 Nov 1:66:337-42. doi: 10.1016/j.compbiomed.2015.07.009. Epub 2015 Jul 18.

Abstract

To extract expert clinic information from the Deep Web, there are two challenges to face. The first one is to make a judgment on forms. A novel method based on a domain model, which is a tree structure constructed by the attributes of query interfaces is proposed. With this model, query interfaces can be classified to a domain and filled in with domain keywords. Another challenge is to extract information from response Web pages indexed by query interfaces. To filter the noisy information on a Web page, a block importance model is proposed, both content and spatial features are taken into account in this model. The experimental results indicate that the domain model yields a precision 4.89% higher than that of the rule-based method, whereas the block importance model yields an F1 measure 10.5% higher than that of the XPath method.

Keywords: Block importance model; Clinic expert information; Domain model; Information extraction; SVM.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computer Simulation*
  • Data Mining / methods
  • Hospitals
  • Information Storage and Retrieval / methods*
  • Internet
  • Medical Informatics / methods
  • Reproducibility of Results
  • Support Vector Machine