Cascaded cross-attention transformers and convolutional neural networks for multi-organ segmentation in male pelvic computed tomography

Rahul Pemmaraju; Gayoung Kim; Lina Mekki; Daniel Y Song; Junghoon Lee

doi:10.1117/1.JMI.11.2.024009

Cascaded cross-attention transformers and convolutional neural networks for multi-organ segmentation in male pelvic computed tomography

J Med Imaging (Bellingham). 2024 Mar;11(2):024009. doi: 10.1117/1.JMI.11.2.024009. Epub 2024 Apr 8.

Authors

Rahul Pemmaraju¹, Gayoung Kim¹, Lina Mekki², Daniel Y Song¹, Junghoon Lee¹

Affiliations

¹ Johns Hopkins University, Department of Radiation Oncology and Molecular Radiation Sciences, Baltimore, Maryland, United States.
² Johns Hopkins University, Department of Biomedical Engineering, Baltimore, Maryland, United States.

PMID: 38595327
PMCID: PMC11001270 (available on 2025-04-08)
DOI: 10.1117/1.JMI.11.2.024009

Abstract

Purpose: Segmentation of the prostate and surrounding organs at risk from computed tomography is required for radiation therapy treatment planning. We propose an automatic two-step deep learning-based segmentation pipeline that consists of an initial multi-organ segmentation network for organ localization followed by organ-specific fine segmentation.

Approach: Initial segmentation of all target organs is performed using a hybrid convolutional-transformer model, axial cross-attention UNet. The output from this model allows for region of interest computation and is used to crop tightly around individual organs for organ-specific fine segmentation. Information from this network is also propagated to the fine segmentation stage through an image enhancement module, highlighting regions of interest in the original image that might be difficult to segment. Organ-specific fine segmentation is performed on these cropped and enhanced images to produce the final output segmentation.

Results: We apply the proposed approach to segment the prostate, bladder, rectum, seminal vesicles, and femoral heads from male pelvic computed tomography (CT). When tested on a held-out test set of 30 images, our two-step pipeline outperformed other deep learning-based multi-organ segmentation algorithms, achieving average dice similarity coefficient (DSC) of $0.836 \pm 0.071$ (prostate), $0.947 \pm 0.038$ (bladder), $0.828 \pm 0.057$ (rectum), $0.724 \pm 0.101$ (seminal vesicles), and $0.933 \pm 0.020$ (femoral heads).

Conclusions: Our results demonstrate that a two-step segmentation pipeline with initial multi-organ segmentation and additional fine segmentation can delineate male pelvic CT organs well. The utility of this additional layer of fine segmentation is most noticeable in challenging cases, as our two-step pipeline produces noticeably more accurate and less erroneous results compared to other state-of-the-art methods on such images.

Keywords: convolutional neural networks; deep learning; image segmentation; transformers.