A Bayesian approach to tissue-fraction estimation for oncological PET segmentation

Tumor segmentation in oncological PET is challenging, a major reason being the partial-volume effects (PVEs) that arise due to low system resolution and finite voxel size. The latter results in tissue-fraction effects (TFEs), i.e. voxels contain a mixture of tissue classes. Conventional segmentation methods are typically designed to assign each image voxel as belonging to a certain tissue class. Thus, these methods are inherently limited in modeling TFEs. To address the challenge of accounting for PVEs, and in particular, TFEs, we propose a Bayesian approach to tissue-fraction estimation for oncological PET segmentation. Specifically, this Bayesian approach estimates the posterior mean of the fractional volume that the tumor occupies within each image voxel. The proposed method, implemented using a deep-learning-based technique, was first evaluated using clinically realistic 2D simulation studies with known ground truth, in the context of segmenting the primary tumor in PET images of patients with lung cancer. The evaluation studies demonstrated that the method accurately estimated the tumor-fraction areas and significantly outperformed widely used conventional PET segmentation methods, including a U-net-based method, on the task of segmenting the tumor. In addition, the proposed method was relatively insensitive to PVEs and yielded reliable tumor segmentation for different clinical-scanner configurations. The method was then evaluated using clinical images of patients with stage IIB/III non-small cell lung cancer from ACRIN 6668/RTOG 0235 multi-center clinical trial. Here, the results showed that the proposed method significantly outperformed all other considered methods and yielded accurate tumor segmentation on patient images with Dice similarity coefficient (DSC) of 0.82 (95% CI: 0.78, 0.86). In particular, the method accurately segmented relatively small tumors, yielding a high DSC of 0.77 for the smallest segmented cross-section of 1.30 cm2. Overall, this study demonstrates the efficacy of the proposed method to accurately segment tumors in PET images.


Introduction
Reliable segmentation of oncological PET images is Required for tasks such as PET-based radiotherapy planning and quantification of radiomic and volumetric features from PET images (Zaidi et al 2009, Cook et al 2018. However, tumor segmentation in PET is challenging for several reasons such as partial-volume effects (PVEs), system noise, and large variabilities in the shape, texture, and location of tumors (Foster et al 2014). Tumor segmentation can be performed by having trained readers delineate the tumors manually. However, manual delineation is both labor-and time-intensive, and suffers from intra-and inter-reader variability (Foster et al 2014). To address these issues, computer-aided segmentation methods have been developed. These include methods based on thresholding, region growing, boundary detection, and stochastic modeling (Kass et al 1988, Foster et al 2014, Sridhar et al 2014, Layer et al 2015. However, these methods suffer from limitations, such as requiring user inputs, sensitivity to model assumptions (Belhassen and Zaidi 2010), and limited ability to account for PVEs. Learningbased methods (Blanc-Durand et al 2018, Zhao et al 2018 have been developed to address these issues. While these methods have demonstrated promise, they typically require manual delineations for training, which are likely affected by PVEs. Thus, accounting for PVEs remains an important challenge in accurate delineation of PET images. The PVEs in PET arise from two sources, namely the limited spatial resolution of PET system and the finite voxel size in the reconstructed image (Soret et al 2007). The limited spatial resolution leads to blurred tumor boundaries. The finite voxel size results in voxels containing a mixture of tumor and normal tissue. This phenomenon is referred to as tissue-fraction effects (TFEs) (Rousset et al 2007). A recently developed deep-learning (DL)-based technique (Leung et al 2020) has attempted to account for PVEs arising due to the low system resolution. However, this method is not able to account for the TFEs. This shortcoming arises because this method, similar to conventional classification-based segmentation methods, is not designed or trained to model TFEs. Instead, this method is designed and trained on the task of classifying each voxel in an image as belonging to a single region. Note that while these learning-based methods can output a probabilistic measure of a voxel belonging to a region, that probability is unrelated to TFEs. Similarly, other probabilistic techniques, such as simultaneous truth and performance level estimation technique (Dewalle-Vignion et al 2015), can yield a probabilistic estimate of the true To address the challenge of accounting for PVEs, and in particular, TFEs, while performing tumor segmentation in PET, in this manuscript, we propose a Bayesian approach to tissuefraction estimation. Specifically, the segmentation problem is posed as a task of estimating the fractional volume that the tumor occupies within each voxel of an image. Through this strategy, we are able to explicitly model TFEs. The proposed method was developed in the context of segmenting the primary tumor in [ 18 F]fluorodeoxyglucose (FDG)-PET images of patients with lung cancer.
In the next section, we develop a theoretical formalism for the proposed method. Our evaluation of the method using both clinically realistic simulations and clinical images of patients with stage IIB/III non-small cell lung cancer (NSCLC) from ACRIN 6668/RTOG 0235 multi-center clinical trial, is then presented in section 3, followed by the results of this evaluation, discussions, and conclusions.

Theory
Consider a PET system imaging a radiotracer distribution, described by a vector f(r), where r = (x, y, z) denotes the spatial coordinates. We denote the tracer uptake in the tumor by f s (r). The rest of the regions are referred to as background, and uptake in the background is denoted as f b (r). Thus, the tracer uptake can be represented mathematically as follows: We define a support function for the tumor region as s(r), i.e.

s(r) =
1, if f s (r) > 0. 0, otherwise. (2) The radiotracer emits photons that are detected by the PET system, yielding projection data. Reconstruction with the projection data yields the reconstructed image, denoted by an M-dimensional vector f. Thus, the mapping from the tracer distribution to the reconstructed image is given by the operator Θ: L 2 ℝ 3 E M . Denote the PET system by a linear continuous-to-discrete operator H, and let the vector n describe the Poisson-distributed noise. Denote the reconstruction operator, quite possibly nonlinear, by R. The eventual reconstructed image is given in operator notation as follows: In the reconstructed image, denote the volume of each voxel by V and define the voxel function as ϕ m (r), i.e. ϕ m (r) = 1, if r lies within the m th voxel of the PET image. 0, otherwise. Estimating v from the reconstructed image is an ill-posed problem due to the null spaces of the H and ℛ operator. Thus, we take a Bayesian approach to estimate v. We first need to define a cost function that penalizes deviation of v from v. A common cost function is the ensemble mean squared error (EMSE), which is the mean squared error averaged over noise realizations and the true values v. However, in our case, the variable v m is constrained to lie within 0 and 1. The EMSE loss does not directly incorporate this constraint. In contrast, using the binary cross-entropy (BCE) loss as the penalizer allows us to incorporate this constraint on v m directly, as also suggested in Creswell et al (2017). The BCE loss between v m and v m , denoted by l BCE v m , v m , is given by We define our cost function C(v, v), as the negative of aggregate BCE loss over all voxels averaged over the joint distribution of true values v and noise realizations f. The cost function is then given by where in the second step we have expanded pr(f, v) using the conditional probability. Inserting equation (6) into (7), we obtain To estimate the point at which this cost function is minimized, we differentiate the cost function with respect to the vector v and set that equal to zero. Because pr f is always nonnegative, the cost function is minimized by setting the derivative of inner integral in equation (8) equal to zero, i.e.
This is then equivalent to performing component-wise differentiation and setting each differentiated component to 0 (Barrett and Myers 2013). For the mth component of equation Since ∫ dv m pr v m | f = 1, the solution to equation (10), denoted by v m * is given by v m * = ∫ dv m pr v m | f v m .
(11) Equivalently, in vector notation, we get which is simply the posterior-mean estimate of v. Note that the same estimator is obtained when the cost function is the EMSE between v and v. Thus, by minimizing the cost function in equation (8), we obtain an optimal estimator that achieves the lowest mean squared error among all possible estimators. We can further show that this estimator is unbiased in a Bayesian sense (proof provided in appendix B).
In summary, we have shown that by developing an optimization procedure that minimizes the cost function defined in equation (8), we obtain a posterior-mean estimate of the tumorfraction volumes in each voxel of the reconstructed image. This estimator yields the lowest mean squared error among all possible estimators. Further, this estimator is unbiased in a Bayesian sense.

Implementation of the proposed technique
While we have developed the theoretical formalism in 3D, in this manuscript, the method was implemented and evaluated on a simplified per-slice basis. Thus, for each pixel in the 2D reconstructed image, the optimizer was designed to yield the posterior mean estimate a* of the true tumor-fraction area (TFA), which we denote by a. We now describe the procedure to implement this optimizer.
Estimating the posterior mean a* requires sampling from the posterior distribution pr(a | f). Sampling from this distribution is challenging as this distribution is high-dimensional and does not have a known analytical form. To address this issue, the proposed method was implemented using a supervised learning-based approach. Specifically, an encoder-decoder network was constructed, as shown in figure 1. During the training phase, this network is provided with a population of PET images, and the corresponding ground-truth TFA map, i.e. the vector a for each image, as described in section 2.1. The network, by minimizing the cost function defined in equation (8) over this population of images, becomes trained to yield the posterior-mean estimate of a given the input PET image.
The network architecture is similar to those for estimation tasks, such as image denoising (Creswell et al 2017) and image reconstruction (Nath et al 2020). To summarize, the network is partitioned into a contracting and an expansive path. The contracting path learns the spatial information from the input PET images and the expansive path maps the learned information to the estimated TFA map for each input image. Skip connections with elementwise addition were applied to feed the features extracted in the contracting path into the expansive path to stabilize the training and improve the learning performance (Mao et al 2016). In the final layer, the network yields the estimate of the TFAs. A detailed description of the network architecture is provided in appendix A (table A1).
As outlined in section 1, the goal of the proposed method is to explicitly model the TFEs while performing tumor segmentation. Our training strategy and network architecture are specifically designed for this goal by defining the ground truth as the TFAs for each image. We contrast this to the conventional DL-based segmentation methods, where, in the ground truth, each pixel is exclusively assigned to the tumor or the normal tissue class and the network is trained to classify each pixel as either tumor or background. Further, as mentioned above, while the conventional DL-based methods can output a probabilistic estimate for each image pixel, this estimate is only a measure of classification uncertainty, and thus has no relation to TFEs, unlike the proposed method.
The network was trained via the Adam optimization algorithm (Kingma and Ba 2014).
In the various experiments mentioned later, the network hyperparameters were optimized on a training set via five-fold cross validation. The network training was implemented in Python 3.6.9, Tensorflow 1.14.0, and Keras 2.2.4. Experiments were performed on a Linux operating system with two NVIDIA Titan RTX graphics processing unit cards.

Evaluation
Evaluating the proposed method requires access to ground truth where either the groundtruth TFA map or a surrogate for the true TFA map, such as tumor delineations defined by trained readers, are known. In section 3.2, we first evaluated the proposed method using clinically realistic simulation studies, where the ground-truth TFA map was known. In these studies, the support of tumor can be described at a very high resolution, simulating s(r) in equation (2). From this high-resolution description, the true TFA within each image pixel can be computed using equation (5), thus providing the TFA map. Realistic simulation studies also model imaging physics and variability in patient populations. Thus, these studies provide a rigorous mechanism to evaluate the method. However, we recognize that simulation studies may have limitations in modeling all aspects of system instrumentation, patient physiology, and patient-population variability, especially in multi-center settings, accurately. Thus, it is important to assess the performance of the method using patient studies, ideally with multi-center trial data. For this purpose, in section 3.3, we evaluated the proposed method on clinical images from the ACRIN 6668/RTOG 0235 multi-center clinical trial, where trained-reader-defined segmentations were used as the surrogate ground truth. We first describe the performance metrics used to quantitatively evaluate the proposed method.

Evaluation metrics
Since the proposed method is an estimation-based segmentation approach, our evaluation used performance metrics for both the task of estimating the true TFA map and of segmenting the tumor.
3.1.1. Evaluation on estimation performance-Performance on the estimation task was evaluated using the EMSE between the true and estimated TFA maps. EMSE provides a combined measure of bias and variance over the distribution of true values and noise realizations, and is thus considered as a comprehensive figure of merit for estimation tasks (Barrett and Myers 2013). The error in the estimate of the TFA maps and the tumor area was quantified using the pixel-wise EMSE and normalized area EMSE, respectively. Denote 〈…〉 X as the expected value of the quantity in the brackets when averaged over the random variable X. The pixel-wise EMSE is given by The normalized area EMSE denotes the EMSE between the true and estimated areas of each tumor, normalized by the true areas. The true and estimated areas, denoted by A and A, are given by the L 1 norms of a and a, respectively. The normalized area EMSE is then given by We have shown (equation (B4) in appendix B) that the proposed method yields an unbiased estimate of a in a Bayesian sense. To verify this, the ensemble-average bias was computed. This term, denoted by b, is an M-dimensional vector b 1 , b 2 , …, b M , with the mth element of the vector quantifying the average bias of the estimated TFA within the mth pixel. Consider a total of P tumor images and N noise realizations for each tumor image. Let a mnp and a mnp denote the true and estimated TFA within the mth pixel for the nth noise realization in the pth tumor image. The mth component of ensemble-average bias, b m , is then given by The proximity of the elements of b to 0 would indicate that the estimator was unbiased in a Bayesian sense.

Evaluation on segmentation performance-
The proposed method estimates the TFA within each pixel, which is a continuous-valued output. For evaluation of segmentation methods that yield such non-binary output, as in Taha and Hanbury (2015), The spatial-overlap metric of Dice similarity coefficient (DSC) and Jaccard similarity coefficient (JSC) were used to measure the agreement between the true and estimated segmentation. The DSC and JSC are defined as Higher values of DSC and JSC indicate higher segmentation accuracy. These metrics were reported as mean values with 95% confidence intervals (CIs). Statistical significance was assessed via a paired sample t-test, with a p-value < 0.01 inferring statistically significant difference.
We also qualitatively evaluated the performance of the proposed method on the task of estimating the TFA map. For this purpose, ground-truth and estimated tumor topographic maps were first constructed from the true and estimated TFA maps using the contour function in MATLAB (MathWorks, Natick, Mass). Specifically, the tumor topographic map shows the topography of the TFA map by means of isocontours. Then, isocontours corresponding to the true and estimated TFA maps were plotted for the TFA values of 0, 1/3, 2/3, and 1. A TFA of 0 implies that no area within that pixel contains the tumor, while a TFA of 1 implies that the entire pixel area is the tumor.

Evaluation of the proposed method using clinically realistic simulation studies
This evaluation study was conducted in the context of segmenting the primary tumor in FDG-PET images of patients with lung cancer. The study quantitatively evaluated the accuracy of the method, compared the method to existing techniques, studied the sensitivity of the method to PVEs, and also studied the performance of the method for different clinical-scanner configurations. In each evaluation, clinically realistic simulated PET images with known ground-truth tumor properties were generated, as described in section 3.2.1. The generated data was split into training and test sets. The proposed method was trained and cross-validated using the training set. The performance of the method was then evaluated using the independent test set. The evaluation study used clinical images, was retrospective, IRB-approved, and HIPAA-compliant with a waiver of informed consent.

Generating realistic simulated PET images-
The simulation strategy advances on a previously proposed approach to simulate PET images (Leung et al 2020).
Briefly, in the first step, realistic tumor-tracer distribution was simulated at a very high resolution, so that the simulated tumor can be described potentially as a continuous object, equivalent to f s (r) in equation (1), except that r = (x, y) is a 2D vector. Specifically, the pixel size in the simulated tumor image was 0.13 mm. This was 1/32 of the resolution in the patient image. The shapes, sizes, and intensities of simulated tumors were sampled from the corresponding distribution derived from clinical images, so that the simulated tumors had variabilities as observed in patient populations. An advancement on the approach proposed in Leung et al (2020) was to simulate intra-tumor heterogeneity using a stochastic lumpy object model (Rolland and Barrett 1992). Existing clinical PET images containing the lung region but with no tumor present were selected as templates to ensure tumor-background realism and account for inter-patient variability. The projection data for the simulated tumor and background were generated using a PET simulation software (Leung et al 2020). Since the simulated tumor had higher resolution compared to the background, we had different projection models for the tumor and background separately. The projection data for the tumor and background were then added, enabling the impact of image reconstruction on the tumor appearance and noise texture to be inherently incorporated .
Reconstruction was performed using a 2D ordered subset expectation maximization (OSEM) algorithm. We have validated the realism of the images simulated using this approach (Liu et al 2021). Detailed simulation and reconstruction parameters will be provided for each of the studies mentioned below.

Evaluating accuracy of the proposed method and comparing to other segmentation methods-
We quantitatively compared the proposed method to several commonly used semi-automated PET segmentation methods, including 40% SUV-max thresholding (Sridhar et al 2014), active-contour-based Snakes (Kass et al 1988), and Markov random fields-Gaussian mixture model (Jha et al 2010, Layer et al 2015. The method was also compared to a fuzzy segmentation method, namely the fuzzy local information C-Means (FLICM) clustering algorithm (Krinidis and Chatzis 2010). Further, the method was compared to a U-net-based PET segmentation method (Leung et al 2020).
The ground truth for training this U-net-based method was defined such that each voxel was classified as either tumor or background. For all the semi-automated segmentation methods, the tumor location was provided by manually generating a rectangular region of interest containing the tumor. In contrast, the proposed and U-net-based method did not require any manual input and were fully automated.
To generate the simulated images for this study, following the procedure in section 3.2.1, we used 318 2D slices from 32 patients for the background portion of the image. The simulated PET system had a spatial resolution of 5 mm full width at half maximum (FWHM). The projection data were reconstructed using the OSEM algorithm with 21 subsets and 2 iterations, similar to the PET reconstruction protocol for the patient images. The reconstructed pixel size was 4.07 mm × 4.07 mm. The network was trained and crossvalidated using 9540 images with five-fold cross validation. Evaluation was then performed on 2070 completely independent test images, which were generated using 69 2D slices from 7 patients. (2011) and Leung et al (2020), we studied the performance of the method as a function of tumor area. For this purpose, all test images were grouped based on the range of the true tumor area. Specifically, the tumor areas were binned with a bin width of 2 cm 2 . For each test image, PVEs-affected tumor masks were generated by applying a rectangular filter to the ground-truth tumor mask, following the strategy in Leung et al (2020). This filter modeled the resolution degradation due to the forward projection and the reconstruction process. The tumor area measured using the proposed method and the PVEs-affected tumor area in all the test images were obtained and divided by the true tumor area. A ratio of unity would indicate that the output was insensitive to PVEs. A ratio lower or higher than unity would indicate an underestimation or overestimation of the true tumor area, respectively, showing that the segmentation output was affected by PVEs ( (table A2). Clinical PET images of patients with lung cancer were obtained from these scanners. Using these clinical scans and following the simulation procedure described in section 3.2.1, a total of 5520 and 6120 simulated PET images were generated for each scanner, respectively. These were used for optimizing and training the network. Next, the trained network was tested on 1200 and 1320 independent simulated images, respectively. The performance of the proposed method was also compared to the U-net-based method.

Evaluation of the proposed method using clinical multi-center PET images
We next evaluated the proposed method using clinical PET images. For this purpose, we used de-identified patient data from the ACRIN 6668/RTOG 0235 multi-center clinical trial (Machtay et al 2013), (Kinahan et al 2019), available from The Cancer Imaging Archive (Clark et al 2013). In this evaluation study, FDG-PET images of 78 patients with inoperable stage IIB/III NSCLC were included. Detailed patient demographics with clinical characteristics are provided in appendix A ( Evaluation of the proposed method would require the knowledge of true TFA maps. For this purpose, a board-certified nuclear-medicine physician (J.C.M) with more than 10 years of experience in reading PET scans identified the primary tumor of each patient by reviewing the PET, CT, and fused PET/CT images along axial, sagittal, and coronal planes using MIM Encore (MIM Software, version 6.9.3). The radiologist was asked to delineate a continuous (un-pixelated) boundary for each identified tumor. For each tumor, the radiologist drew an external tumor boundary and considered the whole volume within that boundary as belonging to the tumor class. A workflow was created in MIM to assist the radiologist with this delineation task. The radiologist used a MIM-based edge-detection tool to segment the tumor in 3D on the fused PET/CT image, by placing the cursor at the center of the tumor and dragging it out until the three orthogonal guiding lines reached the tumor boundary. The radiologist then examined and adjusted the segmentation to make it more accurate and also account for PVEs. This manual segmentation was continuous and allowed for a voxel to consist of a mixture of tumor and normal tissues. The segmentation was saved at a higher resolution than that of the PET image.
From this manual segmentation, we obtained a discrete version of the tumor mask, s(r), as defined in equation (2), for each 2D PET slice and at a higher resolution than the PET image. Specifically, the pixel size in the tumor mask was 1/8 of that in the PET image. This resolution was chosen since more fine sampling did not cause changes in the definition of the tumor mask. Let this high-resolution manual segmentation be an N-dimensional vector (N > M), where we recall that M was the dimension of the PET image. Denote the pixel function in this high-resolution space by ϕ n manual (r), following the similar definition in equation (4). Define an N-dimensional vector ψ(r) with each element of this vector defined as ψ n (r) = 1 if pixel n in the manual segmentation is assigned to tumor class. 0 otherwise.
Denote the pixel area of the PET image by A. We computed the ground-truth TFA within each image pixel as follows: a m = 1 A ∑ n = 1 N ψ n (r) ∫ d 2 rϕ n manual (r)ϕ m (r), where the integral computes the fractional area that nth pixel in the manual segmentation occupies within the mth pixel of the PET image. The network was then trained to estimate the posterior mean of a m for the mth image pixel, following the training strategy described in section 2.2.
The network was trained and cross-validated using 565 2D slices from 61 out of 78 patients. The trained network was then evaluated on 140 completely independent 2D slices from the remaining 17 patients. The performance of the proposed method was compared to the other segmentation methods, described in section 3.2.2, both quantitatively and qualitatively, using the procedure and metrics described in section 3.1.2.

Evaluating accuracy of the proposed method and comparing to
other segmentation methods-Quantitatively, the proposed method significantly outperformed (p < 0.01) all other considered methods, including the U-net-based method, on the basis of the pixel-wise EMSE, normalized area EMSE, DSC, and JSC (figure 2, table A5 in appendix A). The proposed method yielded the lowest pixel-wise EMSE, the lowest normalized area EMSE of 0.02, the highest DSC of 0.90 (95% CI: 0.90, 0.91), and the highest JSC of 0.83 (95% CI: 0.83, 0.84). In addition, all the elements of the ensemble-average bias map were close to 0, providing the evidence that the method yielded an unbiased Bayesian estimate of the TFA map, as shown in section 2.1. Further, the proposed method accurately segmented relatively small tumors, and in particular, yielded high DSC of 0.84 for the smallest segmented tumor axial cross-section of 0.88 cm 2 in area. The diameter of this tumor was approximately twice the FWHM of the system resolution.
We next qualitatively show the performance of the proposed method on the task of estimating the TFA map, following the procedure described in section 3.1.2. We first illustrate the procedure to obtain the isocontours from the ground-truth and estimated TFA maps for a representative tumor (figure 3). We then followed this procedure to obtain the isocontours from the TFA maps for different cases. In figure 4, the comparisons between the true and estimated isocontours for representative slices at four different TFA values are shown. We observe that the proposed method yielded isocontours close to the true isocontours at different considered TFA values. In addition, the method yielded accurate segmentation for different tumor types, including those with substantial intra-tumor heterogeneity as best observed in figures 4(b)-(d). Figure 5 shows that the method yielded percent area overlap close to 100% for all considered tumor sizes, including small tumors with axial cross-section less than 2 cm 2 . For these smaller tumors, the diameter was approximately less than 3 times the FWHM of the system resolution. This was unlike the PVEs-affected tumor areas, which, as expected, were significantly overestimated for smaller tumors. In addition, the proposed method yielded high DSC and JSC for these small tumors, indicating accurate segmentation performance. Further, the proposed method significantly outperformed the U-net-based method. Overall, these results demonstrate the relative insensitivity of the proposed method to PVEs when segmenting relatively small tumors. Further, figure 6 shows that the proposed method consistently yielded lower pixel-wise EMSE and lower area EMSE normalized by the true tumor areas, compared to the U-net-based method. The proposed method also yielded higher DSC and JSC for all tumor sizes. Figure 7 shows the comparison of the segmentation accuracy between the proposed and the U-net-based method for two different clinicalscanner configurations, as described in section 3.2.4. The proposed method significantly outperformed the U-net-based method for both clinical settings, on the basis of pixel-wise EMSE, normalized area EMSE, DSC, and JSC.

Evaluation of the proposed method using clinical multi-center PET images
Quantitatively, the proposed method yielded reliable segmentation with DSC of 0.82 (95% CI: 0.78, 0.86). For 16 out of 17 test patients (94.2%), both the proposed and U-net-based method yielded correct tumor localization in all 2D slices. When considering the patient cases with correct tumor localization, as shown in figure 8 (with details provided in table A6 in appendix A), the proposed method significantly outperformed (p < 0.01) all other considered methods, yielding the lowest pixel-wise EMSE, the lowest normalized area EMSE of 0.14, the highest DSC of 0.87 (95% CI: 0.85, 0.89), and the highest JSC of 0.74 (95% CI: 0.70, 0.78). In addition, the proposed method accurately segmented relatively small tumors and yielded high DSC of 0.77 for the smallest segmented tumor axial crosssection of 1.30 cm 2 in area.
Qualitatively, we observe in figure 9 that the proposed method yielded an accurate match to the true isocontours defined at different considered TFA levels, following the strategy in section 3.1.2 with illustration in figure 3. Further, figure 10 shows that the method accurately segmented tumors with small sizes (a), (e), tumors with convex shape (b), (f), tumors surrounded by regions with high uptake (c), (d), (g), (h), and tumors with substantial intra-tumor heterogeneity (b), (d), (f), (h).

Discussion
In this manuscript, we proposed a Bayesian approach to tissue-fraction estimation for segmentation in oncological PET. Conventional segmentation methods are typically classification-based, i.e. classifying each voxel in the image as belonging to a certain tissue class. Thus, these methods are inherently limited in modeling TFEs. While probabilistic techniques can provide estimates of probabilities that each image voxel belongs to a tissue class, these probabilistic estimates are unrelated to TFEs. We address this inherent limitation by framing the segmentation task as an estimation problem, where the fractional volume that the tumor occupies in each voxel is estimated. Through this strategy, we are able to explicitly model the TFEs while performing segmentation.
Quantitatively, the proposed method yielded accurate performance on estimation of the ground-truth TFA maps and on segmentation tasks, and significantly outperformed the considered segmentation methods, yielding the lowest pixel-wise EMSE and normalized area EMSE, and the highest DSC and JSC, as evaluated using both clinically realistic simulation studies (figure 2) and clinical images from multi-center trial data (figure 8). With clinical images, the method yielded a DSC of 0.82 (95% CI: 0.78, 0.86). Qualitatively, the method yielded isocontours of close match to the ground-truth isocontours defined at different considered TFA values, as we observe from the results in figures 4 and 9. Additionally, as shown in figure 3 for a representative tumor with substantial intra-tumor heterogeneity, the proposed method correctly estimates the TFA value as unity for pixels that are within the tumor boundary but have relatively low intensity. This observation was consistent across different heterogeneous tumors, showing the reliable performance of the proposed method even with heterogeneous tumors. We believe that the method is reliable in this scenario because the method estimates the TFA by computing the conditional expectation of the TFA in that pixel given the entire reconstructed PET image, and not just the intensity of that pixel ((equation 11)). All these results demonstrate the ability of the method to accurately estimate the TFA within each image pixel and yield accurate tumor segmentations.
The isocontours defined based on certain choices of TFA values were shown only to visually illustrate the performance of the proposed method on the task of estimating the TFA map. The proposed method yields the estimated TFA map as the final output. This allows the method to provide the end user, such as a physician or a radiation oncologist, the ability to visualize the TFAs within each PET-image pixel, which they can use to make a decision based on their clinical use-case scenario.
Further, the proposed method demonstrated the ability to accurately segment relatively small tumors. In realistic simulation-based evaluation studies, the method yielded a high DSC of 0.84 for the smallest segmented tumor, with an axial cross-section of 0.88 cm 2 and a diameter approximately twice the FWHM of the system resolution. With clinical images, for the smallest tumor axial cross-section of 1.30 cm 2 , the method yielded a DSC of 0.77. This accuracy in segmenting small tumors is especially important for clinical tasks such as radiotherapy planning, where an accurate segmentation for small tumors is crucial to protect normal organs from radiations.
While the U-net-based method had demonstrated the ability to account for PVEs arising due to the low system resolution (Leung et al 2020), the proposed method significantly outperformed this method, emphasizing the significance of modeling the TFEs in PET segmentation. This need to model TFEs was also demonstrated in the results of evaluation using clinically realistic simulation studies, where the performance of the method was assessed for different clinical-scanner configurations (section 4.1.3). For example, for the higher-resolution Biograph Vision scanner, the TFEs may be more dominant compared to system-resolution-related blur. We observed in figure 7 that the proposed method was more accurate compared to the U-net-based method for this scanner. Further, for both clinicalscanner configurations, the proposed method yielded similar performance in estimating the TFAs and segmenting the tumor, indicating that the method was relatively insensitive to the changes in voxel size.
Our evaluation of the proposed method with clinical images of patients with stage IIB/III NSCLC shows that the method, when trained with 61 patients, yielded a reliable segmentation performance with DSC of 0.82. When considering patient cases where the tumor was localized correctly by the method (94.2%), the DSC further improved to 0.87. These results demonstrate the accuracy of the method in clinical settings and motivate further clinical evaluation of the method with even larger datasets and with delineations defined by multiple readers. Further, the method is general, and the results motivate the evaluation of the method for segmenting tumors other than the primary tumors, including infiltrating tumors, and segmenting tumors at other stages of the disease, including metastasis. In all these cases, the method would require the corresponding definition of the ground-truth TFAs, or a surrogate for the ground truth, such as those from manual delineations performed by trained readers.
The results obtained with the proposed method also motivate further evaluation of this method for PET-based clinical applications that require tumor delineation such as PET-based radiotherapy planning (El Naqa et  which are being evaluated as prognostic and predictive markers of therapy response. Such evaluation can be performed using task-specific evaluation frameworks (Kupinski et al 2006, Jha et al 2012, 2017, Barrett et al 2010. In this context, our initial results in both clinically realistic simulation ( figure 2(b)) and patient studies ( figure 8(b)) on estimating the tumor area indicate the promise of the proposed method on the task of quantifying MTV more accurately than conventional methods.
Our study has some limitations. First, while the theory of the proposed method was developed in the context of 3D imaging, our evaluation studies were conducted on a per-slice basis. This helped to increase the size of training data and was computationally less expensive (Leung et al 2020). However, implementing the method to 3D segmentation is relatively straightforward and would require only slight modifications to our network architecture, such as the ability to be input 3D images and output 3D tumor-fraction volume maps. Thus, the 2D convolutional layers in the network would be replaced by 3D convolutional layers. The overall network design would remain similar. In fact, in the ongoing study on using an extended version of this method for segmenting 3D single-photon emission computed tomography (SPECT) images, we have seen that a similar design was sufficient to perform 3D segmentation (Moon et al 2020, Liu et al 2021. The results shown here and in the SPECT study suggest that the proposed method will yield reliable performance for 3D tumor segmentation in PET, and this is an area of further research. Additionally, in this study, the proposed method was used to segment the image into only two regions. However, the method is general, and in the ongoing study of 3D SPECT segmentation, we are applying this method to segment the images into seven different regions. Another limitation is that our evaluation studies currently consider cases where only the primary tumor is present in an image. However, again, the method could be generalized to potentially segment multiple tumors present in the same image slice. Confirming this though would require additional evaluation studies. Further, respiratory motion of the lung, which may also cause blurring of the tumor mask, was not considered in the proposed method. Extending the method to account for lung motion is also an important research area. Finally, the method does not incorporate tumor information from CT images while segmenting PET images. Incorporating information from CT images can provide a prior distribution of the TFAs for the estimation task. Thus, investigating the incorporation of CT images into the proposed method is another important research direction.
We evaluated our method in the context of segmenting oncological PET images of patients with lung cancer and demonstrated accurate tumor segmentation performance. The method is general and thus, these results motivate the evaluation of the method for other cancer types. However, segmenting tumors in the lung region could be easier due to the scarce FDG uptake in the lung. In other cancer types, tumor-to-background intensity ratios may be lower, which may make the segmentation task challenging. For example, renal tumors often have similar FDG uptake as the normal renal cortex. Further, there may be situations where the FDG uptake in tumor is lower than the background, such as photon-deficient tumors on the liver. Thus, before application to other cancers, corresponding validation studies would be needed. Additionally, the method can be extended to segment PET images for other applications, such as those in cardiology and neurology. Further, the method can be extended to segment images from other imaging modalities that have low resolution, such as SPECT and optical imaging, with ongoing efforts in SPECT (Moon et al 2020, Liu et al 2021.

Conclusion
In this manuscript, we proposed a Bayesian approach to tissue-fraction estimation for oncological PET segmentation. We theoretically demonstrated that the proposed method yields a posterior-mean estimate of the tumor-fraction volume for each voxel in the PET image. Evaluation of the method using clinically realistic 2D simulation studies demonstrated the capability of the method to explicitly model TFEs by accurately estimating the TFAs. The method significantly outperformed the considered commonly used PET segmentation methods, including a U-net-based method. In addition, the method was relatively insensitive to PVEs and demonstrated accurate segmentation performance for different clinical-scanner configurations. Further, the proposed method demonstrated accurate performance in segmenting clinical images of patients with stage IIB/III NSCLC, obtained from the ACRIN 6668/RTOG 0235 multi-center clinical trial data. For this dataset, the method yielded DSC of 0.82 (95% CI: 0.78, 0.86). In conclusion, this study demonstrates the efficacy of the proposed method for tumor segmentation in PET. Open-source codes for the proposed method and supplementary data is available at https://github.com/ziping-liu/A-Bayesian-approach-to-tissue-fractionestimation-for-oncological-PET-segmentation.git.
Evaluation results of the proposed method using clinically realistic simulation studies and clinical images from multi-center clinical trial are given in tables A5 and A6, respectively.
where in the second step we have expanded pr(f, v) using the conditional probability, and in the third step we have inserted equation (12). By using the Bayes' theorem and changing the order of integration, the above equation becomes Thus, the average value of the estimate is equal to the average true value, so that the estimator is unbiased in a Bayesian sense. Evaluation result using clinically realistic simulation studies: (a) the pixel-wise EMSE between the true and estimated tumor-fraction areas; (b) the normalized area EMSE between the measured and true tumor areas (plot displayed in log scale on y-axis for better visualization); (c) the ensemble-average bias of the proposed method; the (d) Dice similarity coefficient and (e) Jaccard similarity coefficient between the true and estimated segmentations. Illustration of the procedure to obtain isocontours from the ground-truth TFA map and the TFA map estimated by the proposed method. Evaluation result using clinically realistic simulation studies: comparison between the estimated isocontours using the proposed method (green) and the ground-truth isocontours (red), defined from set of points at four TFA values (0, 1/3, 2/3, 1).

Figure 5.
Evaluation result using clinically realistic simulation studies: (a) qualitative comparison between the isocontours generated from the PVEs-affected TFA maps and the isocontours generated from the estimated TFA maps using the proposed method. The isocontours were defined as the set of points with TFA equal to 0.5. (b) Quantitative evaluation of the sensitivity of the proposed method to PVEs. Results obtained using the U-net-based method are also shown. Evaluation result using clinically realistic simulation studies: evaluation of the segmentation performance for different clinical-scanner configurations on the basis of (a) pixel-wise EMSE, (b) normalized area EMSE, (c) Dice similarity coefficient, and (d) Jaccard similarity coefficient. Evaluation result using clinical multi-center PET images: (a) the pixel-wise EMSE between the true and estimated tumor-fraction areas; (b) the normalized area EMSE between the measured and true tumor areas (plot displayed in log scale on y-axis for better visualization); the (c) Dice similarity coefficient and (d) Jaccard similarity coefficient between the true and estimated segmentations. Evaluation result using clinical multi-center PET images: comparison between the estimated isocontours using the proposed method (green) and the ground-truth isocontours (red), defined as set of points at four TFA values (0, 1/3, 2/3, 1).

Figure 10.
Evaluation result using clinical multi-center PET images: qualitative assessment of the performance of the proposed method in estimating the TFA maps for small tumors (a), (e), for tumors with convex shape (b), (f), for tumors surrounded by regions with high uptake (c)-(h), and for tumors with substantial intra-tumor heterogeneity (b), (d), (f), (h). Isocontours were defined as set of points at TFA = 0.5.