Diversity in Machine Learning: A Systematic Review of Text-Based Diagnostic Applications

Appl Clin Inform. 2022 May;13(3):569-582. doi: 10.1055/s-0042-1749119. Epub 2022 May 25.

Abstract

Objective: As the storage of clinical data has transitioned into electronic formats, medical informatics has become increasingly relevant in providing diagnostic aid. The purpose of this review is to evaluate machine learning models that use text data for diagnosis and to assess the diversity of the included study populations.

Methods: We conducted a systematic literature review on three public databases. Two authors reviewed every abstract for inclusion. Articles were included if they used or developed machine learning algorithms to aid in diagnosis. Articles focusing on imaging informatics were excluded.

Results: From 2,260 identified papers, we included 78. Of the machine learning models used, neural networks were relied upon most frequently (44.9%). Studies had a median population of 661.5 patients, and diseases and disorders of 10 different body systems were studied. Of the 35.9% (N = 28) of papers that included race data, 57.1% (N = 16) of study populations were majority White, 14.3% were majority Asian, and 7.1% were majority Black. In 75% (N = 21) of papers, White was the largest racial group represented. Of the papers included, 43.6% (N = 34) included the sex ratio of the patient population.

Discussion: With the power to build robust algorithms supported by massive quantities of clinical data, machine learning is shaping the future of diagnostics. Limitations of the underlying data create potential biases, especially if patient demographics are unknown or not included in the training.

Conclusion: As the movement toward clinical reliance on machine learning accelerates, both recording demographic information and using diverse training sets should be emphasized. Extrapolating algorithms to demographics beyond the original study population leaves large gaps for potential biases.

Publication types

  • Systematic Review

MeSH terms

  • Algorithms*
  • Humans
  • Machine Learning*