Send to

Choose Destination
J Biomed Inform. 2017 May;69:75-85. doi: 10.1016/j.jbi.2017.03.016. Epub 2017 Mar 27.

Enriching consumer health vocabulary through mining a social Q&A site: A similarity-based approach.

Author information

School of Information, Florida State University, Tallahassee, FL 32306, USA; Institute for Successful Longevity, Florida State University, Tallahassee, FL 32306, USA. Electronic address:
Department of Computer Science, Florida State University, Tallahassee, FL 32306, USA.
Department of Library and Information Science, Chungnam National University, South Korea.
School of Communication, Florida State University, Tallahassee, FL 32306, USA.
Department of Health Outcomes and Policy, University of Florida, Gainesville, FL 32608, USA.


The widely known vocabulary gap between health consumers and healthcare professionals hinders information seeking and health dialogue of consumers on end-user health applications. The Open Access and Collaborative Consumer Health Vocabulary (OAC CHV), which contains health-related terms used by lay consumers, has been created to bridge such a gap. Specifically, the OAC CHV facilitates consumers' health information retrieval by enabling consumer-facing health applications to translate between professional language and consumer friendly language. To keep up with the constantly evolving medical knowledge and language use, new terms need to be identified and added to the OAC CHV. User-generated content on social media, including social question and answer (social Q&A) sites, afford us an enormous opportunity in mining consumer health terms. Existing methods of identifying new consumer terms from text typically use ad-hoc lexical syntactic patterns and human review. Our study extends an existing method by extracting n-grams from a social Q&A textual corpus and representing them with a rich set of contextual and syntactic features. Using K-means clustering, our method, simiTerm, was able to identify terms that are both contextually and syntactically similar to the existing OAC CHV terms. We tested our method on social Q&A corpora on two disease domains: diabetes and cancer. Our method outperformed three baseline ranking methods. A post-hoc qualitative evaluation by human experts further validated that our method can effectively identify meaningful new consumer terms on social Q&A.


Consumer health information; Consumer health vocabulary; Controlled vocabularies; Ontology enrichment; Social Q&A

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Elsevier Science Icon for PubMed Central
Loading ...
Support Center