Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
Med Phys. 2010 Apr;37(4):1788-95.

Assessing operating characteristics of CAD algorithms in the absence of a gold standard.

Author information

  • 1Department of Statistics, University College Cork, Cork, Ireland. kingshuk@ucc.ie

Abstract

PURPOSE:

The authors examine potential bias when using a reference reader panel as "gold standard" for estimating operating characteristics of CAD algorithms for detecting lesions. As an alternative, the authors propose latent class analysis (LCA), which does not require an external gold standard to evaluate diagnostic accuracy.

METHODS:

A binomial model for multiple reader detections using different diagnostic protocols was constructed, assuming conditional independence of readings given true lesion status. Operating characteristics of all protocols were estimated by maximum likelihood LCA. Reader panel and LCA based estimates were compared using data simulated from the binomial model for a range of operating characteristics. LCA was applied to 36 thin section thoracic computed tomography data sets from the Lung Image Database Consortium (LIDC): Free search markings of four radiologists were compared to markings from four different CAD assisted radiologists. For real data, bootstrap-based resampling methods, which accommodate dependence in reader detections, are proposed to test of hypotheses of differences between detection protocols.

RESULTS:

In simulation studies, reader panel based sensitivity estimates had an average relative bias (ARB) of -23% to -27%, significantly higher (p-value < 0.0001) than LCA (ARB--2% to -6%). Specificity was well estimated by both reader panel (ARB -0.6% to -0.5%) and LCA (ARB 1.4%-0.5%). Among 1145 lesion candidates LIDC considered, LCA estimated sensitivity of reference readers (55%) was significantly lower (p-value 0.006) than CAD assisted readers' (68%). Average false positives per patient for reference readers (0.95) was not significantly lower (p-value 0.28) than CAD assisted readers' (1.27).

CONCLUSIONS:

Whereas a gold standard based on a consensus of readers may substantially bias sensitivity estimates, LCA may be a significantly more accurate and consistent means for evaluating diagnostic accuracy.

PMID:
20443501
[PubMed - indexed for MEDLINE]
PMCID:
PMC2864671
Free PMC Article

Images from this publication.See all images (3)Free text

Figure 1
Figure 2
Figure 3
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for PubMed Central
    Loading ...
    Write to the Help Desk