Comparison of Machine Learning Methods in the Study of Cancer Survivors' Return to Work: An Example of Breast Cancer Survivors with Work-Related Factors in the CONSTANCES Cohort

J Occup Rehabil. 2023 Dec;33(4):750-756. doi: 10.1007/s10926-023-10112-8. Epub 2023 Mar 20.

Abstract

Purpose: Machine learning (ML) methods showed a higher accuracy in identifying individuals without cancer who were unable to return to work (RTW) compared to the classical methods (e.g. logistic regression models). We therefore aim to discuss the value of these methods in relation to RTW for cancer survivors.

Methods: Breast cancer (BC) survivors who were working at diagnosis within the CONSTANCES cohort were included in the study. RTW was assessed five years after the BC diagnosis (early retirement was considered as non-RTW). Age and occupation at diagnosis, and physical occupational job exposures assessed using the Job Exposure Matrix, JEM-CONSTANCES, were evaluated as predictors of RTW five years after BC diagnosis. The following four ML methods were used: (i) k-nearest neighbors; (ii) random forest; (iii) neural network; and (iv) elastic net.

Results: The training sample included 683 BC survivors (RTW: 85.7%), and the test sample 171 (RTW: 85.4%). The elastic net method had the best results despite low sensitivity (accuracy = 76.6%; sensitivity = 31.7%; specificity = 90.8%), and the random forest model was the most accurate (= 79.5%) but also the least sensitive (= 14.3%).

Conclusion: This study takes a first step towards opening up new possibilities for identifying the occupational determinants of cancer survivors' RTW. Further work, including a larger sample size, and more predictor variables, is now needed.

Keywords: Breast cancer; Machine learning; Methods; Prediction; Return to work; Survivors.

MeSH terms

  • Breast Neoplasms*
  • Cancer Survivors*
  • Female
  • Humans
  • Occupations
  • Return to Work
  • Survivors