Abdomen CT multi-organ segmentation using token-based MLP-Mixer

Med Phys. 2023 May;50(5):3027-3038. doi: 10.1002/mp.16135. Epub 2022 Dec 20.

Abstract

Background: Manual contouring is very labor-intensive, time-consuming, and subject to intra- and inter-observer variability. An automated deep learning approach to fast and accurate contouring and segmentation is desirable during radiotherapy treatment planning.

Purpose: This work investigates an efficient deep-learning-based segmentation algorithm in abdomen computed tomography (CT) to facilitate radiation treatment planning.

Methods: In this work, we propose a novel deep-learning model utilizing U-shaped multi-layer perceptron mixer (MLP-Mixer) and convolutional neural network (CNN) for multi-organ segmentation in abdomen CT images. The proposed model has a similar structure to V-net, while a proposed MLP-Convolutional block replaces each convolutional block. The MLP-Convolutional block consists of three components: an early convolutional block for local features extraction and feature resampling, a token-based MLP-Mixer layer for capturing global features with high efficiency, and a token projector for pixel-level detail recovery. We evaluate our proposed network using: (1) an institutional dataset with 60 patient cases and (2) a public dataset (BCTV) with 30 patient cases. The network performance was quantitatively evaluated in three domains: (1) volume similarity between the ground truth contours and the network predictions using the Dice score coefficient (DSC), sensitivity, and precision; (2) surface similarity using Hausdorff distance (HD), mean surface distance (MSD) and residual mean square distance (RMS); and (3) the computational complexity reported by the number of network parameters, training time, and inference time. The performance of the proposed network is compared with other state-of-the-art networks.

Results: In the institutional dataset, the proposed network achieved the following volume similarity measures when averaged over all organs: DSC = 0.912, sensitivity = 0.917, precision = 0.917, average surface similarities were HD = 11.95 mm, MSD = 1.90 mm, RMS = 3.86 mm. The proposed network achieved DSC = 0.786 and HD = 9.04 mm on the public dataset. The network also shows statistically significant improvement, which is evaluated by a two-tailed Wilcoxon Mann-Whitney U test, on right lung (MSD where the maximum p-value is 0.001), spinal cord (sensitivity, precision, HD, RMSD where p-value ranges from 0.001 to 0.039), and stomach (DSC where the maximum p-value is 0.01) over all other competing networks. On the public dataset, the network report statistically significant improvement, which is shown by the Wilcoxon Mann-Whitney test, on pancreas (HD where the maximum p-value is 0.006), left (HD where the maximum p-value is 0.022) and right adrenal glands (DSC where the maximum p-value is 0.026). In both datasets, the proposed method can generate contours in less than 5 s. Overall, the proposed MLP-Vnet demonstrates comparable or better performance than competing methods with much lower memory complexity and higher speed.

Conclusions: The proposed MLP-Vnet demonstrates superior segmentation performance, in terms of accuracy and efficiency, relative to state-of-the-art methods. This reliable and efficient method demonstrates potential to streamline clinical workflows in abdominal radiotherapy, which may be especially important for online adaptive treatments.

Keywords: CT image; MLP-Mixer; abdomen organ segmentation; efficient segmentation network.

MeSH terms

  • Abdomen / diagnostic imaging
  • Algorithms
  • Humans
  • Image Processing, Computer-Assisted / methods
  • Lung
  • Neural Networks, Computer*
  • Tomography, X-Ray Computed*