Automated Breast Density Assessment in MRI Using Deep Learning and Radiomics: Strategies for Reducing Inter-Observer Variability

Xueping Jing; Mirjam Wielema; Andrea G Monroy-Gonzalez; Thom R G Stams; Shekar V K Mahesh; Matthijs Oudkerk; Paul E Sijens; Monique D Dorrius; Peter M A van Ooijen

doi:10.1002/jmri.29058

Automated Breast Density Assessment in MRI Using Deep Learning and Radiomics: Strategies for Reducing Inter-Observer Variability

J Magn Reson Imaging. 2023 Oct 17. doi: 10.1002/jmri.29058. Online ahead of print.

Authors

Affiliations

¹ Department of Radiation Oncology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
² Machine Learning Lab, Data Science Center in Health (DASH), University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
³ Department of Radiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
⁴ Faculty of Medical Sciences, University of Groningen, Groningen, The Netherlands.
⁵ Institute of Diagnostic Accuracy Research B.V., Groningen, The Netherlands.

PMID: 37846440
DOI: 10.1002/jmri.29058

Abstract

Background: Accurate breast density evaluation allows for more precise risk estimation but suffers from high inter-observer variability.

Purpose: To evaluate the feasibility of reducing inter-observer variability of breast density assessment through artificial intelligence (AI) assisted interpretation.

Study type: Retrospective.

Population: Six hundred and twenty-one patients without breast prosthesis or reconstructions were randomly divided into training (N = 377), validation (N = 98), and independent test (N = 146) datasets.

Field strength/sequence: 1.5 T and 3.0 T; T1-weighted spectral attenuated inversion recovery.

Assessment: Five radiologists independently assessed each scan in the independent test set to establish the inter-observer variability baseline and to reach a reference standard. Deep learning and three radiomics models were developed for three classification tasks: (i) four Breast Imaging-Reporting and Data System (BI-RADS) breast composition categories (A-D), (ii) dense (categories C, D) vs. non-dense (categories A, B), and (iii) extremely dense (category D) vs. moderately dense (categories A-C). The models were tested against the reference standard on the independent test set. AI-assisted interpretation was performed by majority voting between the models and each radiologist's assessment.

Statistical tests: Inter-observer variability was assessed using linear-weighted kappa (κ) statistics. Kappa statistics, accuracy, and area under the receiver operating characteristic curve (AUC) were used to assess models against reference standard.

Results: In the independent test set, five readers showed an overall substantial agreement on tasks (i) and (ii), but moderate agreement for task (iii). The best-performing model showed substantial agreement with reference standard for tasks (i) and (ii), but moderate agreement for task (iii). With the assistance of the AI models, almost perfect inter-observer variability was obtained for tasks (i) (mean κ = 0.86), (ii) (mean κ = 0.94), and (iii) (mean κ = 0.94).

Data conclusion: Deep learning and radiomics models have the potential to help reduce inter-observer variability of breast density assessment.

Level of evidence: 3 TECHNICAL EFFICACY: Stage 1.

Keywords: breast density; breast imaging; deep learning; inter-observer variability; radiomics.