Format

Send to

Choose Destination
Nat Commun. 2018 Dec 6;9(1):5217. doi: 10.1038/s41467-018-07619-7.

Why rankings of biomedical image analysis competitions should be interpreted with care.

Author information

1
Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany. l.maier-hein@dkfz.de.
2
Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany.
3
Centre for Intelligent Machines, McGill University, Montreal, QC, H3A0G4, Canada.
4
Christian Doppler Laboratory for Ophthalmic Image Analysis, Department of Ophthalmology, Medical University Vienna, 1090, Vienna, Austria.
5
Science and Engineering Faculty, Queensland University of Technology, Brisbane, QLD, 4001, Australia.
6
Department of Electrical and Computer Engineering, Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA.
7
CISTIB - Center for Computational Imaging & Simulation Technologies in Biomedicine, The University of Leeds, Leeds, Yorkshire, LS2 9JT, UK.
8
Department of Radiology and Nuclear Medicine, Medical Image Analysis, Radboud University Center, 6525 GA, Nijmegen, The Netherlands.
9
Institute of Information Systems Engineering, TU Wien, 1040, Vienna, Austria.
10
Complexity Science Hub Vienna, 1080, Vienna, Austria.
11
Heidelberg Collaboratory for Image Processing (HCI), Heidelberg University, 69120, Heidelberg, Germany.
12
Centre for Biomedical Image Analysis, Masaryk University, 60200, Brno, Czech Republic.
13
Electrical Engineering, Vanderbilt University, Nashville, TN, 37235-1679, USA.
14
Institute of Medical Informatics, Universität zu Lübeck, 23562, Lübeck, Germany.
15
Division of Medical Image Computing (MIC), German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany.
16
Institute for Advanced Studies, Department of Informatics, Technical University of Munich, 80333, Munich, Germany.
17
Information System Institute, HES-SO, Sierre, 3960, Switzerland.
18
Departments of Radiology, Nuclear Medicine and Medical Informatics, Erasmus MC, 3015 GD, Rotterdam, The Netherlands.
19
Department of Computer Science, University of Warwick, Coventry, CV4 7AL, UK.
20
Department of Radiation Oncology, Massachusetts General Hospital, Boston, MA, 02114, USA.
21
Institute of Biomedical Engineering, University of Oxford, Oxford, OX3 7DQ, UK.
22
Division of Translational Surgical Oncology (TCO), National Center for Tumor Diseases Dresden, 01307, Dresden, Germany.
23
Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany.
24
Centre for Medical Image Computing (CMIC) & Department of Computer Science, University College London, London, W1W 7TS, UK.
25
Data Science Studio, Research Studios Austria FG, 1090, Vienna, Austria.
26
Department of Electrical Engineering, Eindhoven University of Technology, 5600 MB, Eindhoven, The Netherlands.
27
AIExplore, NTUST Center of Computer Vision and Medical Imaging, Graduate Institute of Biomedical Engineering, National Taiwan University of Science and Technology, Taipei, 106, Taiwan.
28
Institute of Diagnostic and Interventional Radiology, University Medical Center Rostock, 18051, Rostock, Germany.
29
Institute for Surgical Technology and Biomechanics, University of Bern, Bern, 3014, Switzerland.
30
Univ Rennes, Inserm, LTSI (Laboratoire Traitement du Signal et de l'Image) - UMR_S 1099, Rennes, 35043, Cedex, France.
31
Division of Biostatistics, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany.

Abstract

International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis challenges conducted up to now. We demonstrate the importance of challenges and show that the lack of quality control has critical consequences. First, reproducibility and interpretation of the results is often hampered as only a fraction of relevant information is typically provided. Second, the rank of an algorithm is generally not robust to a number of variables such as the test data used for validation, the ranking scheme applied and the observers that make the reference annotations. To overcome these problems, we recommend best practice guidelines and define open research questions to be addressed in the future.

PMID:
30523263
PMCID:
PMC6284017
DOI:
10.1038/s41467-018-07619-7
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center