Learning from small data: Classifying sex from retinal images via deep learning

PLoS One. 2023 Aug 3;18(8):e0289211. doi: 10.1371/journal.pone.0289211. eCollection 2023.

Abstract

Deep learning (DL) techniques have seen tremendous interest in medical imaging, particularly in the use of convolutional neural networks (CNNs) for the development of automated diagnostic tools. The facility of its non-invasive acquisition makes retinal fundus imaging particularly amenable to such automated approaches. Recent work in the analysis of fundus images using CNNs relies on access to massive datasets for training and validation, composed of hundreds of thousands of images. However, data residency and data privacy restrictions stymie the applicability of this approach in medical settings where patient confidentiality is a mandate. Here, we showcase results for the performance of DL on small datasets to classify patient sex from fundus images-a trait thought not to be present or quantifiable in fundus images until recently. Specifically, we fine-tune a Resnet-152 model whose last layer has been modified to a fully-connected layer for binary classification. We carried out several experiments to assess performance in the small dataset context using one private (DOVS) and one public (ODIR) data source. Our models, developed using approximately 2500 fundus images, achieved test AUC scores of up to 0.72 (95% CI: [0.67, 0.77]). This corresponds to a mere 25% decrease in performance despite a nearly 1000-fold decrease in the dataset size compared to prior results in the literature. Our results show that binary classification, even with a hard task such as sex categorization from retinal fundus images, is possible with very small datasets. Our domain adaptation results show that models trained with one distribution of images may generalize well to an independent external source, as in the case of models trained on DOVS and tested on ODIR. Our results also show that eliminating poor quality images may hamper training of the CNN due to reducing the already small dataset size even further. Nevertheless, using high quality images may be an important factor as evidenced by superior generalizability of results in the domain adaptation experiments. Finally, our work shows that ensembling is an important tool in maximizing performance of deep CNNs in the context of small development datasets.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Deep Learning*
  • Fundus Oculi
  • Humans
  • Neural Networks, Computer

Grants and funding

(AB) IVADO PostDoc-2022-4083608672 (AB) CRM Applied Math Lab postdoctoral funding (no number) (OY) NSERC Discovery Grant (22R82411) (OY) Pacific Institute for the Mathematical Sciences (PIMS) CRG 33 (IO) NSERC Discovery Grant (RGPIN-2019-05554) (IO) NSERC Accelerator Supplement (RGPAS-2019-00026) (IO & OY) UBC DSI Grant (no number) (IO) UBC Faculty of Science STAIR grant (IO & OY) UBC DMCBH Kickstart grant (IO & OY) UBC Health VPR HiFi grant Institut de valorisation des données https://ivado.ca Centres de recherches en mathématiques http://www.crm.umontreal.ca/labo/mathappli Natural Sciences and Engineering Research Council of Canada: https://www.nserc-crsng.gc.ca Pacific Institute for the Mathematical Sciences: https://www.pims.math.ca UBC DSI: https://dsi.ubc.ca/ UBC STAIR: https://science.ubc.ca/research/stair UBC DMCBH: https://www.centreforbrainhealth.ca/ UBC Health: https://health.ubc.ca/ The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.