Deep learning for classifying fibrotic lung disease on high-resolution computed tomography: a case-cohort study

Simon L F Walsh; Lucio Calandriello; Mario Silva; Nicola Sverzellati

doi:10.1016/S2213-2600(18)30286-8

Deep learning for classifying fibrotic lung disease on high-resolution computed tomography: a case-cohort study

Lancet Respir Med. 2018 Nov;6(11):837-845. doi: 10.1016/S2213-2600(18)30286-8. Epub 2018 Sep 16.

Authors

Simon L F Walsh¹, Lucio Calandriello², Mario Silva³, Nicola Sverzellati³

Affiliations

¹ Department of Radiology, King's College Hospital Foundation Trust, London, UK. Electronic address: slfwalsh@gmail.com.
² Department of Radiology, Fondazione Policlinico Universitario A Gemelli IRCCS, Rome, Italy.
³ Department of Medicine and Surgery, University of Parma, Parma, Italy.

PMID: 30232049
DOI: 10.1016/S2213-2600(18)30286-8

Abstract

Background: Based on international diagnostic guidelines, high-resolution CT plays a central part in the diagnosis of fibrotic lung disease. In the correct clinical context, when high-resolution CT appearances are those of usual interstitial pneumonia, a diagnosis of idiopathic pulmonary fibrosis can be made without surgical lung biopsy. We investigated the use of a deep learning algorithm for provision of automated classification of fibrotic lung disease on high-resolution CT according to criteria specified in two international diagnostic guideline statements: the 2011 American Thoracic Society (ATS)/European Respiratory Society (ERS)/Japanese Respiratory Society (JRS)/Latin American Thoracic Association (ALAT) guidelines for diagnosis and management of idiopathic pulmonary fibrosis and the Fleischner Society diagnostic criteria for idiopathic pulmonary fibrosis.

Methods: In this case-cohort study, for algorithm development and testing, a database of 1157 anonymised high-resolution CT scans showing evidence of diffuse fibrotic lung disease was generated from two institutions. We separated the scans into three non-overlapping cohorts (training set, n=929; validation set, n=89; and test set A, n=139) and classified them using 2011 ATS/ERS/JRS/ALAT idiopathic pulmonary fibrosis diagnostic guidelines. For each scan, the lungs were segmented and resampled to create a maximum of 500 unique four slice combinations, which we converted into image montages. The final training dataset consisted of 420 096 unique montages for algorithm training. We evaluated algorithm performance, reported as accuracy, prognostic accuracy, and weighted κ coefficient (κw) of interobserver agreement, on test set A and a cohort of 150 high-resolution CT scans (test set B) with fibrotic lung disease compared with the majority vote of 91 specialist thoracic radiologists drawn from multiple international thoracic imaging societies. We then reclassified high-resolution CT scans according to Fleischner Society diagnostic criteria for idiopathic pulmonary fibrosis. We retrained the algorithm using these criteria and evaluated its performance on 75 fibrotic lung disease specific high-resolution CT scans compared with four specialist thoracic radiologists using weighted κ coefficient of interobserver agreement.

Findings: The accuracy of the algorithm on test set A was 76·4%, with 92·7% of diagnoses within one category. The algorithm took 2·31 s to evaluate 150 four slice montages (each montage representing a single case from test set B). The median accuracy of the thoracic radiologists on test set B was 70·7% (IQR 65·3-74·7), and the accuracy of the algorithm was 73·3% (93·3% were within one category), outperforming 60 (66%) of 91 thoracic radiologists. Median interobserver agreement between each of the thoracic radiologists and the radiologist's majority opinion was good (κw=0·67 [IQR 0·58-0·72]). Interobserver agreement between the algorithm and the radiologist's majority opinion was good (κw=0·69), outperforming 56 (62%) of 91 thoracic radiologists. The algorithm provided equally prognostic discrimination between usual interstitial pneumonia and non-usual interstitial pneumonia diagnoses (hazard ratio 2·88, 95% CI 1·79-4·61, p<0·0001) compared with the majority opinion of the thoracic radiologists (2·74, 1·67-4·48, p<0·0001). For Fleischner Society high-resolution CT criteria for usual interstitial pneumonia, median interobserver agreement between the radiologists was moderate (κw=0·56 [IQR 0·55-0·58]), but was good between the algorithm and the radiologists (κw=0·64 [0·55-0·72]).

Interpretation: High-resolution CT evaluation by a deep learning algorithm might provide low-cost, reproducible, near-instantaneous classification of fibrotic lung disease with human-level accuracy. These methods could be of benefit to centres at which thoracic imaging expertise is scarce, as well as for stratification of patients in clinical trials.

Funding: None.

Publication types

Multicenter Study
Validation Study

MeSH terms

Cohort Studies
Deep Learning / standards*
Humans
Idiopathic Pulmonary Fibrosis / classification*
Idiopathic Pulmonary Fibrosis / diagnostic imaging
Idiopathic Pulmonary Fibrosis / pathology
Lung Diseases, Interstitial / classification*
Lung Diseases, Interstitial / diagnostic imaging
Lung Diseases, Interstitial / pathology
Practice Guidelines as Topic
Proportional Hazards Models
Reproducibility of Results
Tomography, X-Ray Computed / methods*