Format

Send to

Choose Destination
See comment in PubMed Commons below
Stat Med. 2007 Mar 30;26(7):1532-51.

Performance assessment for radiologists interpreting screening mammography.

Author information

1
Institute of Statistics and Decision Sciences, Duke University, Durham, NC 27708-0251, USA. dawn@stat.duke.edu

Abstract

When interpreting screening mammograms radiologists decide whether suspicious abnormalities exist that warrant the recall of the patient for further testing. Previous work has found significant differences in interpretation among radiologists; their false-positive and false-negative rates have been shown to vary widely. Performance assessments of individual radiologists have been mandated by the U.S. government, but concern exists about the adequacy of current assessment techniques. We use hierarchical modelling techniques to infer about interpretive performance of individual radiologists in screening mammography. While doing this we account for differences due to patient mix and radiologist attributes (for instance, years of experience or interpretive volume). We model at the mammogram level, and then use these models to assess radiologist performance. Our approach is demonstrated with data from mammography registries and radiologist surveys. For each mammogram, the registries record whether or not the woman was found to have breast cancer within one year of the mammogram; this criterion is used to determine whether the recall decision was correct. We model the false-positive rate and the false-negative rate separately using logistic regression on patient risk factors and radiologist random effects. The radiologist random effects are, in turn, regressed on radiologist attributes such as the number of years in practice. Using these Bayesian hierarchical models we examine several radiologist performance metrics. The first is the difference between the false-positive or false-negative rate of a particular radiologist and that of a hypothetical 'standard' radiologist with the same attributes and the same patient mix. A second metric predicts the performance of each radiologist on hypothetical mammography exams with particular combinations of patient risk factors (which we characterize as 'typical', 'high-risk', or 'low-risk'). The second metric can be used to compare one radiologist to another, while the first metric addresses how the radiologist is performing compared to an appropriate standard. Interval estimates are given for the metrics, thereby addressing uncertainty. The particular novelty in our contribution is to estimate multiple performance rates (sensitivity and specificity). One can even estimate a continuum of performance rates such as a performance curve or ROC curve using our models and we describe how this may be done. In addition to assessing radiologists in the original data set, we also show how to infer about the performance of a new radiologist with new case mix, new outcome data, and new attributes without having to refit the model.

PMID:
16847870
PMCID:
PMC3152258
DOI:
10.1002/sim.2633
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for PubMed Central
    Loading ...
    Support Center