Format

Send to

Choose Destination
Comput Biol Med. 2018 Mar 1;94:1-10. doi: 10.1016/j.compbiomed.2017.12.026. Epub 2018 Jan 3.

Prediction of venous thromboembolism using semantic and sentiment analyses of clinical narratives.

Author information

1
Computer Science and Engineering Department, Oakland University, 2200 N. Squirrel Rd, Rochester, MI 48309, USA. Electronic address: sabra@oakland.edu.
2
Computer Science and Engineering Department, Oakland University, 2200 N. Squirrel Rd, Rochester, MI 48309, USA. Electronic address: mahmood@oakland.edu.
3
Computer Science and Engineering Department, Oakland University, 2200 N. Squirrel Rd, Rochester, MI 48309, USA. Electronic address: malobaid@oakland.edu.

Abstract

Venous thromboembolism (VTE) is the third most common cardiovascular disorder. It affects people of both genders at ages as young as 20 years. The increased number of VTE cases with a high fatality rate of 25% at first occurrence makes preventive measures essential. Clinical narratives are a rich source of knowledge and should be included in the diagnosis and treatment processes, as they may contain critical information on risk factors. It is very important to make such narrative blocks of information usable for searching, health analytics, and decision-making. This paper proposes a Semantic Extraction and Sentiment Assessment of Risk Factors (SESARF) framework. Unlike traditional machine-learning approaches, SESARF, which consists of two main algorithms, namely, ExtractRiskFactor and FindSeverity, prepares a feature vector as the input to a support vector machine (SVM) classifier to make a diagnosis. SESARF matches and maps the concepts of VTE risk factors and finds adjectives and adverbs that reflect their levels of severity. SESARF uses a semantic- and sentiment-based approach to analyze clinical narratives of electronic health records (EHR) and then predict a diagnosis of VTE. We use a dataset of 150 clinical narratives, 80% of which are used to train our prediction classifier support vector machine, with the remaining 20% used for testing. Semantic extraction and sentiment analysis results yielded precisions of 81% and 70%, respectively. Using a support vector machine, prediction of patients with VTE yielded precision and recall values of 54.5% and 85.7%, respectively.

KEYWORDS:

Natural language processing; Prediction through classification; Risk factor assessment; Semantic enrichment; Sentiment analysis; Support vector machine; Venous thromboembolism

[Indexed for MEDLINE]
Free full text

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center