Cascaded cross-attention transformers and convolutional neural networks for multi-organ segmentation in male pelvic computed tomography

J Med Imaging (Bellingham). 2024 Mar;11(2):024009. doi: 10.1117/1.JMI.11.2.024009. Epub 2024 Apr 8.

Abstract

Purpose: Segmentation of the prostate and surrounding organs at risk from computed tomography is required for radiation therapy treatment planning. We propose an automatic two-step deep learning-based segmentation pipeline that consists of an initial multi-organ segmentation network for organ localization followed by organ-specific fine segmentation.

Approach: Initial segmentation of all target organs is performed using a hybrid convolutional-transformer model, axial cross-attention UNet. The output from this model allows for region of interest computation and is used to crop tightly around individual organs for organ-specific fine segmentation. Information from this network is also propagated to the fine segmentation stage through an image enhancement module, highlighting regions of interest in the original image that might be difficult to segment. Organ-specific fine segmentation is performed on these cropped and enhanced images to produce the final output segmentation.

Results: We apply the proposed approach to segment the prostate, bladder, rectum, seminal vesicles, and femoral heads from male pelvic computed tomography (CT). When tested on a held-out test set of 30 images, our two-step pipeline outperformed other deep learning-based multi-organ segmentation algorithms, achieving average dice similarity coefficient (DSC) of 0.836±0.071 (prostate), 0.947±0.038 (bladder), 0.828±0.057 (rectum), 0.724±0.101 (seminal vesicles), and 0.933±0.020 (femoral heads).

Conclusions: Our results demonstrate that a two-step segmentation pipeline with initial multi-organ segmentation and additional fine segmentation can delineate male pelvic CT organs well. The utility of this additional layer of fine segmentation is most noticeable in challenging cases, as our two-step pipeline produces noticeably more accurate and less erroneous results compared to other state-of-the-art methods on such images.

Keywords: convolutional neural networks; deep learning; image segmentation; transformers.