Detecting clinically related content in online patient posts

J Biomed Inform. 2017 Nov:75:96-106. doi: 10.1016/j.jbi.2017.09.015. Epub 2017 Oct 3.

Abstract

Patients with chronic health conditions use online health communities to seek support and information to help manage their condition. For clinically related topics, patients can benefit from getting opinions from clinical experts, and many are concerned about misinformation and biased information being spread online. However, a large volume of community posts makes it challenging for moderators and clinical experts, if there are any, to provide necessary information. Automatically identifying forum posts that need validated clinical resources can help online health communities efficiently manage content exchange. This automation can also assist patients in need of clinical expertise by getting proper help. We present our results on testing text classification models that efficiently and accurately identify community posts containing clinical topics. We annotated 1817 posts comprised of 4966 sentences of an existing online diabetes community. We found that our classifier performed the best (F-measure: 0.83, Precision: 0.79, Recall:0.86) when using Naïve Bayes algorithm, unigrams, bigrams, trigrams, and MetaMap Symantic Types. Training took 5 s. The classification process took a fraction of 1 s. We applied our classifier to another online diabetes community, and the results were: F-measure: 0.63, Precision: 0.57, Recall: 0.71. Our results show our model is feasible to scale to other forums on identifying posts containing clinical topic with common errors properly addressed.

Keywords: Classification; Clinical topic; Diabetes; Health information seeking; Human-computer interaction; Online health communities; Patient; Text mining.

MeSH terms

  • Algorithms
  • Chronic Disease*
  • Disease Management
  • Humans
  • Online Systems*
  • Patients*