We propose random-effects models to summarize and quantify the accuracy of the diagnosis of multiple lesions on a single image without assuming independence between lesions. The number of false-positive lesions was assumed to be distributed as a Poisson mixture, and the proportion of true-positive lesions was assumed to be distributed as a binomial mixture. We considered univariate and bivariate, both parametric and nonparametric mixture models. We applied our tools to simulated data and data of a study assessing diagnostic accuracy of virtual colonography with computed tomography in 200 patients suspected of having one or more polyps.

Diagnostic accuracy is usually summarized with the true-positive rate (TPR) or sensitivity and the false-positive rate (FPR) or one minus specificity. Some diagnostic tasks are more complicated than simple detection of a single occurrence of the (abnormal) condition, and in such cases calculating TPR and FPR may prove difficult. In this paper, we will discuss as an example the diagnostic accuracy of virtual colonography in locating colonic polyps and documenting their size or severity. Accurate sensitive diagnosis is essential as polyps may develop into tumor tissue, but diagnosis must also be sufficiently specific to minimize invasive procedures. Patients often have more than one polyp, any of which may be seen or missed. Other examples of such a diagnostic situation are occurrence of multiple lesions using mammography or multiple infarcts using computed tomography (CT) or magnetic resonance imaging in a patient suspected of having a stroke.

In this paper, we propose random-effects models to summarize and quantify the accuracy of the diagnosis of multiple lesions on a single image without the independence assumption. In Section 2, we describe the statistical model underlying our approach. In section 4, we apply our tool to the data of a study assessing diagnostic accuracy of virtual colonography with CT in 200 patients suspected of having one or more polyps.

We consider the situation where

Since the exact locations of the FPs are of lesser importance, we consider the number of FP lesions in patient _{i}, say, and assume that _{i} follows the Poisson distribution with expectation ^{−μ}. Given a sample of observations _{1},…,_{N}, the maximum likelihood estimator of the parameter _{i}_{i}/

The assumption of a common FPR across patients is often too restrictive, and often the variance of the number of FPs is larger than the mean, thus violating the Poisson assumption. A common method to relax this independence assumption is to assume that the intensity parameter varies randomly between patients, according to a distribution with density

The density function

Maximum likelihood estimates of (

Other parametric functions for _{1},…,_{J}) with probabilities _{1},…,_{J}), where _{1},_{2},…,_{N} is given by

In contrast to FPs, the exact locations of the TP lesions are of primary importance, and we therefore consider the outcome of the diagnostic test for each lesion in patient _{ij} as the result of the diagnostic test at the location of the _{ij}=1 when the diagnostic test is positive and _{ij}=0 if it is negative.

Sensitivity is defined as the probability of a TP test result. As with the specificity, we allow sensitivity _{i} to vary between patients. Given _{i}, the outcomes of the diagnostic testing in patient _{ij} and _{il}, are assumed to be independent for all _{i} lesions, _{i1},…,_{ik}_{i}, therefore equals _{i}|_{i})=_{i} is small, the maximum likelihood estimate of _{i} is very unstable. For this reason, we again assume that in the population of patients _{i} given _{i}|(_{0}^{1}_{i}|

Again there is usually little evidence to prefer a particular distribution, and therefore one might again prefer a nonparametric specification. As in case of the specificity, we use a discrete distribution with _{1},…,_{J}) and probabilities _{1},…,_{J}), where _{i} lesions in patient

Given the maximum likelihood estimates

The model can be generalized in several ways. First, both specificity and sensitivity may depend on observed patient characteristics. Including this dependence in the model may explain the variation between patients. For the FP lesions, the logarithm of the Poisson intensity parameter _{i} in patient _{i}=_{0}+_{1z}_{i1}+…+_{i}, where _{i} is a residual component. The lesion-specific sensitivity of patient _{i})=_{0}+_{1z}_{i1}+…+_{i}.

Instead of modeling the correlations between (false) lesions in the same patient with a random effect, these correlations may be ignored. The analysis then boils down to simple Poisson and binomial regressions, and the associated standard errors can be corrected with a generalized estimation equation approach (

Multiple-lesion data allow estimation of the relation between the per-patient specificity and the per-lesion sensitivity, “even with only one cutpoint”. The relation can be directly modeled in a bivariate random-effects model in order to obtain the ROC curve. In such a model, the Poisson parameter _{i}=log(_{i}) and the logit-transformed parameter _{i}=logit(_{i}) are assumed to have a bivariate distribution _{i} as _{i}=_{i}+_{i}. An obvious choice for

Instead of the bivariate normal distribution, a nonparametric distribution can be used which is defined by a 2-dimensional grid of points _{rs} with weights _{rs}(

We simulated data somewhat according to the example of diagnosing colon polyps (lesions) with virtual colonography, which we will discuss in Section 4. We performed several simulations, and we will show an extreme and slightly pathological example. For ease of the simulation, lesions (and nonlesions) were assumed to be characterized by one variable only, which we might call the lesion thickness, and that each patient consisted of 50 locations. Data for 50 patients at the 50 locations were simulated as follows. First, the number of true lesions per patient was sampled from the Poisson distribution with expectation of 5 lesions. The thickness of each lesion per patient was sampled from a normal distribution; the means and variances of this distribution varied between patients, and these patient-specific parameters were sampled themselves from a normal distribution (mean) and a gamma distribution (variance) with fixed parameters. The thickness variable at the nonlesion locations was also sampled from a normal distribution with means and variances sampled from normal and gamma distributions. Finally, a threshold was chosen such that the overall number of identified true lesions was a fixed number (30%, 50%, or 70%), and afterwards, the number of FP lesions and the number of TP lesions were counted for each patient.

We discuss results from one simulation with a threshold chosen such that 50% of all lesions were identified. In the 50 patients, the number of lesions varied between 0 (in 1 patient) and 9 with a mean of 4.8 and standard deviation 2.0. There were 1124 FP lesions, and the number of FP lesions varied between 0 and 49 per patient with mean 22.48 and variance 388.87, see

Histogram of the number of FP lesions per patient in the simulated data.

There were 14 patients without FP lesions, and the specificity could therefore be calculated as 14/50=28%. According to the simple Poisson model (log-likelihood ^{−22.48}≈0%, which is clearly much too low. The generalized estimation equation (GEE) approach, implemented in SAS proc genmod, does not correct this bias but does increase the standard error of the estimate to 2.76. The parameters of the gamma–Poisson model (log-likelihood ^{log(a)−log(b)}=22.42 and specificity 18%, which is still too low. The log-normal–Poisson mixture model had a similar estimate of the specificity (15%). The nonparametric Poisson mixture (log-likelihood

A total of 234 lesions were present in 49 out of the 50 simulated patients, and the number varied between 1 and 9 with mean 4.8 (standard deviation 2.0). By choosing our threshold, 117 lesions were identified yielding lesion sensitivity of 117/234=50%. The log-likelihood of this binomial model was ^{4}=94%. The observed proportion of identified lesions per patient varied between 0 and 1 with mean 0.49 and standard deviation 0.44. This variation cannot be explained by sampling only since the beta-binomial model (

The log-likelihood of the nonparametric bivariate model was −234.17, which was slightly larger than the sum of the 2 likelihoods of the 2 nonparametric mixture models (−164.63)+(−76.29)=−240.92. The correlation between the logit-transformed sensitivity and the Poisson parameter was estimated as −0.20.

We now apply our methods to data from a study to evaluate the test characteristics of CT colonography at different levels of radiation dose. In this study, 200 patients at risk for colorectal cancer were evaluated for the presence of one or more polyps. Colonoscopy was used as the reference standard. Herein only lesions > 6 mm will be considered. Colonographic lesions were identified by 3D display mode, and the lesion was defined as a TP if it was located in the same segment and had similar size and appearance to a lesion identified by colonoscopy. Detailed information can be found in

Of the 200 patients, there were 174 patients without polyps and 26 patients with between 1 and 7 lesions; in total there were 44 lesions. Of these 44 lesions, there were 32 identified by virtual colonography (70%); the percentage lesions identified by virtual colonography varied in the 26 patients between 0% and 100% (mean 60%, standard deviation 47%).

FP lesions were observed in 68 patients. There were 93 FP lesions in total, and the number of FP lesions varied between 1 and 3; the mean was 0.5 and variance 0.6. Note that the variance is about the same as the mean which points to the fact that the differences between patients in number of FPs are likely due to chance only and not due to systematic differences between patients.

The different estimates of the specificity are given in ^{−0.47}=63% with 95% confidence interval e^{−0.47±1.96×0.048}, that is, 57−69%. Using a GEE approach, assuming exchangeable covariance between the FPs, the standard error of

Results of the different models to estimate patient-specific specificity values

Model | Log-likelihood | AIC | Specificity |

Poisson | – 183.97 | 369.95 | 0.63 (0.57–0.69) |

Poisson + GEE | — | — | 0.63 (0.56–0.70) |

Gamma–Poisson mixture | – 182.17 | 368.34 | 0.66 (0.59–0.74) |

Nonparametric Poisson mixture | – 181.93 | 369.86 | 0.66 (0.58–0.73) |

Both random-effect models have higher likelihood values than the Poisson model pointing to (small) systematic differences between patients with respect to the number of FP lesions. The nonparametric distribution of the nonparametric mixture model was reduced to 2 points (at almost 0 with weight 0.56 and at 1.41 with weight 0.44).

The different estimates of the sensitivity are given in

Results of the different models to estimate lesion-specific sensitivity values

Model | Log-likelihood | AIC | Lesion sensitivity |

Binomial | – 24.40 | 50.79 | 0.73 (0.58–0.84) |

Binomial + GEE | — | — | 0.73 (0.57–0.85) |

Beta-binomial | – 20.99 | 45.98 | 0.63 (0.50–0.77) |

Nonparametric binomial mixture | – 20.89 | 51.78 | 0.65 (0.47–0.79) |

The deviance of the model with a bivariate normal random effects

Estimated regression line between sensitivity and specificity derived from the analysis using the bivariate normal random-effects model.

Although the association is very weak in the present case, the figure illustrates the classical inverse relationship between sensitivity and specificity, and the model can be used to evaluate specific choices for sensitivity/specificity.

There are 2 statistical problems with multiple-lesion diagnostic data. The first is the correlation between the multiple lesions in the same patient. If the correlation is ignored, then the diagnostic yield is often overestimated, and the estimated standard errors and confidence intervals are almost certainly too small. There are 2 general statistical methods to take this correlation into account in the statistical analysis, either with a marginal model with a generalized estimation equation approach or with random-effect models. We chose the latter approach.

The second problem is that in principle an infinite number of FP lesions might be detected, and this means that the lesion-specific specificity parameter cannot be assessed. Our approach models the “lesion-specific” sensitivity because often it is important to diagnose all true lesions and the “patient-specific” specificity. We defined the patient-specific specificity as the probability to have zero FP lesions.

We modeled the number of FP lesions and the number of TP lesions with the Poisson and the binomial distributions, respectively. The associated parameters were considered to be patient-specific random effects sampled from some distribution

The primary effect of the random-effect models is that the model-based estimate of the marginal patient-specificity and patient-sensitivity values is much closer to the observed values. This effect is seen best in simulated examples. Of 50 patients, there were 14 patients without FPs, suggesting a specificity of 28%. Using the simple Poisson model, the probability to have zero FPs was estimated to be 0%. Thus, assuming independence between FPs leads to underestimation of the specificity. According to the nonparametric random-effect model, the specificity was estimated as 28%, and this close correspondence was seen in almost all simulations. The Poisson–gamma random-effects model also has this effect, but this effect depends on the goodness-of-fit of the assumed gamma distribution. A similar effect was seen with the probability of identifying at least one true lesion in a patient. If we define this as the “patient-specific” sensitivity, then the random-effects models correspond much better to the observed rate, and when independence between lesions is assumed, this probability is overestimated.

Multiple-lesion data allow estimation of the relationship between patient-specific specificity and lesion-specific sensitivity directly using a bivariate random-effects model in a similar fashion as was described by