A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction

Adam Yala; Constance Lehman; Tal Schuster; Tally Portnoi; Regina Barzilay

doi:10.1148/radiol.2019182716

A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction

Radiology. 2019 Jul;292(1):60-66. doi: 10.1148/radiol.2019182716. Epub 2019 May 7.

Authors

Adam Yala¹, Constance Lehman¹, Tal Schuster¹, Tally Portnoi¹, Regina Barzilay¹

Affiliation

¹ From the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 32 Vassar St, 32-G484, Cambridge, MA 02139 (A.Y., T.S., T.P., R.B.); and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (C.L.).

PMID: 31063083
DOI: 10.1148/radiol.2019182716

Abstract

Background Mammographic density improves the accuracy of breast cancer risk models. However, the use of breast density is limited by subjective assessment, variation across radiologists, and restricted data. A mammography-based deep learning (DL) model may provide more accurate risk prediction. Purpose To develop a mammography-based DL breast cancer risk model that is more accurate than established clinical breast cancer risk models. Materials and Methods This retrospective study included 88 994 consecutive screening mammograms in 39 571 women between January 1, 2009, and December 31, 2012. For each patient, all examinations were assigned to either training, validation, or test sets, resulting in 71 689, 8554, and 8751 examinations, respectively. Cancer outcomes were obtained through linkage to a regional tumor registry. By using risk factor information from patient questionnaires and electronic medical records review, three models were developed to assess breast cancer risk within 5 years: a risk-factor-based logistic regression model (RF-LR) that used traditional risk factors, a DL model (image-only DL) that used mammograms alone, and a hybrid DL model that used both traditional risk factors and mammograms. Comparisons were made to an established breast cancer risk model that included breast density (Tyrer-Cuzick model, version 8 [TC]). Model performance was compared by using areas under the receiver operating characteristic curve (AUCs) with DeLong test (P < .05). Results The test set included 3937 women, aged 56.20 years ± 10.04. Hybrid DL and image-only DL showed AUCs of 0.70 (95% confidence interval [CI]: 0.66, 0.75) and 0.68 (95% CI: 0.64, 0.73), respectively. RF-LR and TC showed AUCs of 0.67 (95% CI: 0.62, 0.72) and 0.62 (95% CI: 0.57, 0.66), respectively. Hybrid DL showed a significantly higher AUC (0.70) than TC (0.62; P < .001) and RF-LR (0.67; P = .01). Conclusion Deep learning models that use full-field mammograms yield substantially improved risk discrimination compared with the Tyrer-Cuzick (version 8) model. © RSNA, 2019 Online supplemental material is available for this article. See also the editorial by Sitek and Wolfe in this issue.

MeSH terms

Adult
Aged
Aged, 80 and over
Breast / diagnostic imaging
Breast Neoplasms / diagnostic imaging*
Deep Learning*
Female
Humans
Mammography / methods*
Middle Aged
Radiographic Image Interpretation, Computer-Assisted / methods*
Reproducibility of Results
Retrospective Studies
Risk Assessment