Optimal approach for classification of acute leukemia subtypes based on gene expression data

Biotechnol Prog. 2002 Jul-Aug;18(4):847-54. doi: 10.1021/bp025517o.

Abstract

The classification of cancer subtypes, which is critical for successful treatment, has been studied extensively with the use of gene expression profiles from oligonucleotide chips or cDNA microarrays. Various pattern recognition methods have been successfully applied to gene expression data. However, these methods are not optimal, rather they are high-performance classifiers that emphasize only classification accuracy. In this paper, we propose an approach for the construction of the optimal linear classifier using gene expression data. Two linear classification methods, linear discriminant analysis (LDA) and discriminant partial least-squares (DPLS), are applied to distinguish acute leukemia subtypes. These methods are shown to give satisfactory accuracy. Moreover, we determined optimally the number of genes participating in the classification (a remarkably small number compared to previous results) on the basis of the statistical significance test. Thus, the proposed method constructs the optimal classifier that is composed of a small size predictor and provides high accuracy.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Discriminant Analysis
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Leukemia, Myeloid, Acute / classification*
  • Leukemia, Myeloid, Acute / genetics*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated
  • Precursor Cell Lymphoblastic Leukemia-Lymphoma / classification*
  • Precursor Cell Lymphoblastic Leukemia-Lymphoma / genetics*
  • Probability
  • Reproducibility of Results
  • Sensitivity and Specificity