Predicting the Survival of Patients With Cancer From Their Initial Oncology Consultation Document Using Natural Language Processing

JAMA Netw Open. 2023 Feb 1;6(2):e230813. doi: 10.1001/jamanetworkopen.2023.0813.

Abstract

Importance: Predicting short- and long-term survival of patients with cancer may improve their care. Prior predictive models either use data with limited availability or predict the outcome of only 1 type of cancer.

Objective: To investigate whether natural language processing can predict survival of patients with general cancer from a patient's initial oncologist consultation document.

Design, setting, and participants: This retrospective prognostic study used data from 47 625 of 59 800 patients who started cancer care at any of the 6 BC Cancer sites located in the province of British Columbia between April 1, 2011, and December 31, 2016. Mortality data were updated until April 6, 2022, and data were analyzed from update until September 30, 2022. All patients with a medical or radiation oncologist consultation document generated within 180 days of diagnosis were included; patients seen for multiple cancers were excluded.

Exposures: Initial oncologist consultation documents were analyzed using traditional and neural language models.

Main outcomes and measures: The primary outcome was the performance of the predictive models, including balanced accuracy and receiver operating characteristics area under the curve (AUC). The secondary outcome was investigating what words the models used.

Results: Of the 47 625 patients in the sample, 25 428 (53.4%) were female and 22 197 (46.6%) were male, with a mean (SD) age of 64.9 (13.7) years. A total of 41 447 patients (87.0%) survived 6 months, 31 143 (65.4%) survived 36 months, and 27 880 (58.5%) survived 60 months, calculated from their initial oncologist consultation. The best models achieved a balanced accuracy of 0.856 (AUC, 0.928) for predicting 6-month survival, 0.842 (AUC, 0.918) for 36-month survival, and 0.837 (AUC, 0.918) for 60-month survival, on a holdout test set. Differences in what words were important for predicting 6- vs 60-month survival were found.

Conclusions and relevance: These findings suggest that models performed comparably with or better than previous models predicting cancer survival and that they may be able to predict survival using readily available data without focusing on 1 cancer type.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Female
  • Humans
  • Male
  • Medical Oncology
  • Middle Aged
  • Natural Language Processing*
  • Neoplasms* / therapy
  • Referral and Consultation
  • Retrospective Studies