Machine Learning Provides an Accurate Classification of Diffuse Large B-Cell Lymphoma from Immunohistochemical Data

J Pathol Inform. 2018 Jun 13:9:21. doi: 10.4103/jpi.jpi_14_18. eCollection 2018.

Abstract

Background: The classification of diffuse large B-cell lymphomas into Germinal Center (GCB) and non-GC subtypes defines disease subgroups which are different both in terms of gene expression and prognosis. Given their clinical significance, several classification algorithms have been designed, some by making use of widely availability immunohistochemical techniques. Despite their high concordance with gene expression profiles (GEP) and prognostic value, these algorithms were based on technical and biological assumptions that could be improved in terms of performance for classification.

Methods: In order to overcome this limitation, a new algorithm was obtained by analyzing a previously published dataset of 475 patients by using an automatic classification tree method.

Results: The resulting algorithm classifies correctly 91.6% of the cases when compared to GEP, displaying a Receiver-Operator Characteristic (ROC) area under the curve of 0.934. Noteworthy features of this algorithm include the capability to classify GEP-unclassifiable cases and a significant prognostic value, both in terms of overall survival (60 months for non-GC vs not reached for GCB, P = 0.007) and progression-free survival (61.9 months vs not reached, P = 0.017).

Conclusion: By using a machine learning classification method that avoids most pre-assumptions, the novel algorithm obtained is accurate and maintains relevant features for clinical implementation.

Keywords: Cell of origin; immunohistochemistry; lymphoma; machine learning; prognostic.