Text categorization models for identifying unproven cancer treatments on the web

Stud Health Technol Inform. 2007;129(Pt 2):968-72.

Abstract

The nature of the internet as a non-peer-reviewed (and largely unregulated) publication medium has allowed wide-spread promotion of inaccurate and unproven medical claims in unprecedented scale. Patients with conditions that are not currently fully treatable are particularly susceptible to unproven and dangerous promises about miracle treatments. In extreme cases, fatal adverse outcomes have been documented. Most commonly, the cost is financial, psychological, and delayed application of imperfect but proven scientific modalities. To help protect patients, who may be desperately ill and thus prone to exploitation, we explored the use of machine learning techniques to identify web pages that make unproven claims. This feasibility study shows that the resulting models can identify web pages that make unproven claims in a fully automatic manner, and substantially better than previous web tools and state-of-the-art search engine technology.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Artificial Intelligence*
  • Feasibility Studies
  • Humans
  • Information Services / standards
  • Information Storage and Retrieval
  • Internet*
  • Neoplasms / therapy*
  • Quackery*
  • ROC Curve