Evaluation of deep learning-based multiparametric MRI oropharyngeal primary tumor auto-segmentation and investigation of input channel effects: Results from a prospective imaging registry

Kareem A Wahid; Sara Ahmed; Renjie He; Lisanne V van Dijk; Jonas Teuwen; Brigid A McDonald; Vivian Salama; Abdallah S R Mohamed; Travis Salzillo; Cem Dede; Nicolette Taku; Stephen Y Lai; Clifton D Fuller; Mohamed A Naser

doi:10.1016/j.ctro.2021.10.003

Evaluation of deep learning-based multiparametric MRI oropharyngeal primary tumor auto-segmentation and investigation of input channel effects: Results from a prospective imaging registry

Clin Transl Radiat Oncol. 2021 Oct 16:32:6-14. doi: 10.1016/j.ctro.2021.10.003. eCollection 2022 Jan.

Affiliations

¹ Department of Radiation Oncology, University of Texas MD Anderson Cancer Center, Houston, TX USA.
² Department of Medical Imaging, Radboud University Medical Centre, Nijmegen, The Netherlands.
³ Department of Head and Neck Surgery, University of Texas MD Anderson Cancer Center, Houston, TX USA.

Abstract

Background/purpose: Oropharyngeal cancer (OPC) primary gross tumor volume (GTVp) segmentation is crucial for radiotherapy. Multiparametric MRI (mpMRI) is increasingly used for OPC adaptive radiotherapy but relies on manual segmentation. Therefore, we constructed mpMRI deep learning (DL) OPC GTVp auto-segmentation models and determined the impact of input channels on segmentation performance.

Materials/methods: GTVp ground truth segmentations were manually generated for 30 OPC patients from a clinical trial. We evaluated five mpMRI input channels (T2, T1, ADC, Ktrans, Ve). 3D Residual U-net models were developed and assessed using leave-one-out cross-validation. A baseline T2 model was compared to mpMRI models (T2 + T1, T2 + ADC, T2 + Ktrans, T2 + Ve, all five channels [ALL]) primarily using the Dice similarity coefficient (DSC). False-negative DSC (FND), false-positive DSC, sensitivity, positive predictive value, surface DSC, Hausdorff distance (HD), 95% HD, and mean surface distance were also assessed. For the best model, ground truth and DL-generated segmentations were compared through a blinded Turing test using three physician observers.

Results: Models yielded mean DSCs from 0.71 ± 0.12 (ALL) to 0.73 ± 0.12 (T2 + T1). Compared to the T2 model, performance was significantly improved for FND, sensitivity, surface DSC, HD, and 95% HD for the T2 + T1 model (p < 0.05) and for FND for the T2 + Ve and ALL models (p < 0.05). No model demonstrated significant correlations between tumor size and DSC (p > 0.05). Most models demonstrated significant correlations between tumor size and HD or Surface DSC (p < 0.05), except those that included ADC or Ve as input channels (p > 0.05). On average, there were no significant differences between ground truth and DL-generated segmentations for all observers (p > 0.05).

Conclusion: DL using mpMRI provides reasonably accurate segmentations of OPC GTVp that may be comparable to ground truth segmentations generated by clinical experts. Incorporating additional mpMRI channels may increase the performance of FND, sensitivity, surface DSC, HD, and 95% HD, and improve model robustness to tumor size.

Evaluation of deep learning-based multiparametric MRI oropharyngeal primary tumor auto-segmentation and investigation of input channel effects: Results from a prospective imaging registry

Authors

Affiliations

Abstract

Grants and funding