Methods for Clinical Evaluation of Artificial Intelligence Algorithms for Medical Diagnosis

Seong Ho Park; Kyunghwa Han; Hye Young Jang; Ji Eun Park; June-Goo Lee; Dong Wook Kim; Jaesoon Choi

doi:10.1148/radiol.220182

Methods for Clinical Evaluation of Artificial Intelligence Algorithms for Medical Diagnosis

Radiology. 2023 Jan;306(1):20-31. doi: 10.1148/radiol.220182. Epub 2022 Nov 8.

Authors

Seong Ho Park¹, Kyunghwa Han¹, Hye Young Jang¹, Ji Eun Park¹, June-Goo Lee¹, Dong Wook Kim¹, Jaesoon Choi¹

Affiliation

¹ From the Department of Radiology and Research Institute of Radiology (S.H.P., J.E.P., D.W.K.) and Department of Biomedical Engineering (J.C.), Asan Medical Center, University of Ulsan College of Medicine, 88, Olympic-ro 43-gil, Songpa-gu, Seoul 05505, South Korea; Department of Radiology, Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, South Korea (K.H.); Department of Radiology, National Cancer Center, Goyang, South Korea (H.Y.J.); and Biomedical Engineering Research Center, Asan Institute for Life Sciences, University of Ulsan College of Medicine, Seoul, South Korea (J.G.L.).

PMID: 36346314
DOI: 10.1148/radiol.220182

Abstract

Adequate clinical evaluation of artificial intelligence (AI) algorithms before adoption in practice is critical. Clinical evaluation aims to confirm acceptable AI performance through adequate external testing and confirm the benefits of AI-assisted care compared with conventional care through appropriately designed and conducted studies, for which prospective studies are desirable. This article explains some of the fundamental methodological points that should be considered when designing and appraising the clinical evaluation of AI algorithms for medical diagnosis. The specific topics addressed include the following: (a) the importance of external testing of AI algorithms and strategies for conducting the external testing effectively, (b) the various metrics and graphical methods for evaluating the AI performance as well as essential methodological points to note in using and interpreting them, (c) paired study designs primarily for comparative performance evaluation of conventional and AI-assisted diagnoses, (d) parallel study designs primarily for evaluating the effect of AI intervention with an emphasis on randomized clinical trials, and (e) up-to-date guidelines for reporting clinical studies on AI, with an emphasis on guidelines registered in the EQUATOR Network library. Sound methodological knowledge of these topics will aid the design, execution, reporting, and appraisal of clinical evaluation of AI.

Publication types

Review
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Artificial Intelligence*
Humans
Prospective Studies
Randomized Controlled Trials as Topic
Research Design