Format

Send to

Choose Destination
Cancer Gene Ther. 2019 May 29. doi: 10.1038/s41417-019-0105-y. [Epub ahead of print]

Identification of leukemia stem cell expression signatures through Monte Carlo feature selection strategy and support vector machine.

Author information

1
Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China.
2
School of Life Sciences, Shanghai University, Shanghai, 200444, P. R. China.
3
Department of Radiology, Columbia University Medical Center, New York, NY, 10032, USA.
4
Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China.
5
College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, P. R. China.
6
Department of Computer Science, Guangdong AIB Polytechnic, Guangzhou, 510507, P. R. China.
7
Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, 200241, P. R. China.
8
Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China. xykong@sibs.ac.cn.
9
Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China. tohuangtao@126.com.
10
School of Life Sciences, Shanghai University, Shanghai, 200444, P. R. China. cai_yud@126.com.

Abstract

Acute myeloid leukemia (AML) is a type of blood cancer characterized by the rapid growth of immature white blood cells from the bone marrow. Therapy resistance resulting from the persistence of leukemia stem cells (LSCs) are found in numerous patients. Comparative transcriptome studies have been previously conducted to analyze differentially expressed genes between LSC+ and LSC- cells. However, these studies mainly focused on a limited number of genes with the most obvious expression differences between the two cell types. We developed a computational approach incorporating several machine learning algorithms, including Monte Carlo feature selection (MCFS), incremental feature selection (IFS), support vector machine (SVM), Repeated Incremental Pruning to Produce Error Reduction (RIPPER), to identify gene expression features specific to LSCs. One thousand 0ne hudred fifty-nine features (genes) were first identified, which can be used to build the optimal SVM classifier for distinguishing LSC+ and LSC- cells. Among these 1159 genes, the top 17 genes were identified as LSC-specific biomarkers. In addition, six classification rules were produced by RIPPER algorithm. The subsequent literature review on these features/genes and the classification rules and functional enrichment analyses of the 1159 features/genes confirmed the relevance of extracted genes and rules to the characteristics of LSCs.

PMID:
31138902
DOI:
10.1038/s41417-019-0105-y

Supplemental Content

Full text links

Icon for Nature Publishing Group
Loading ...
Support Center