Format

Send to

Choose Destination
J Pain Symptom Manage. 2018 Jun;55(6):1492-1499. doi: 10.1016/j.jpainsymman.2018.02.016. Epub 2018 Feb 27.

Machine Learning Methods to Extract Documentation of Breast Cancer Symptoms From Electronic Health Records.

Author information

1
Department of Electrical Engineering and Computer Science, CSAIL, MIT, Cambridge, Massachusetts.
2
Department of Surgical Oncology, Massachusetts General Hospital, Boston, Massachusetts.
3
Department of Medicine, Waitemata District Health Board, Auckland, New Zealand.
4
Department of Medicine, Primary Care and Population Health, Stanford School of Medicine, Stanford, California; VA Palo Alto Health Care System, Palo Alto, California.
5
Division of Population Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts; Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts; Division of Palliative Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts.
6
Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts; Division of Palliative Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts.
7
Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts; Division of Palliative Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts. Electronic address: Charlotta_lindvall@DFCI.harvard.edu.

Abstract

CONTEXT:

Clinicians document cancer patients' symptoms in free-text format within electronic health record visit notes. Although symptoms are critically important to quality of life and often herald clinical status changes, computational methods to assess the trajectory of symptoms over time are woefully underdeveloped.

OBJECTIVES:

To create machine learning algorithms capable of extracting patient-reported symptoms from free-text electronic health record notes.

METHODS:

The data set included 103,564 sentences obtained from the electronic clinical notes of 2695 breast cancer patients receiving paclitaxel-containing chemotherapy at two academic cancer centers between May 1996 and May 2015. We manually annotated 10,000 sentences and trained a conditional random field model to predict words indicating an active symptom (positive label), absence of a symptom (negative label), or no symptom at all (neutral label). Sentences labeled by human coder were divided into training, validation, and test data sets. Final model performance was determined on 20% test data unused in model development or tuning.

RESULTS:

The final model achieved precision of 0.82, 0.86, and 0.99 and recall of 0.56, 0.69, and 1.00 for positive, negative, and neutral symptom labels, respectively. The most common positive symptoms were pain, fatigue, and nausea. Machine-based labeling of 103,564 sentences took two minutes.

CONCLUSION:

We demonstrate the potential of machine learning to gather, track, and analyze symptoms experienced by cancer patients during chemotherapy. Although our initial model requires further optimization to improve the performance, further model building may yield machine learning methods suitable to be deployed in routine clinical care, quality improvement, and research applications.

KEYWORDS:

Machine learning; breast cancer; electronic health record; natural language processing; palliative care; patient-reported symptoms

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center