Format

Send to

Choose Destination
BMC Bioinformatics. 2005 Jun 15;6:148.

Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes.

Author information

1
School of Informatics, The University of Edinburgh, Edinburgh EH8 9LE, United Kingdom. thanya@eng.cmu.ac.th

Abstract

BACKGROUND:

In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golub et al. 1 and the NCI60 dataset of Ross et al. 2 present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. We also examine the initial gene selection step whereby the most informative genes are selected from the genes assayed.

RESULTS:

In the absence of feature selection, classification accuracy on the training data is typically good, but not replicated on the testing data. Gene selection using the RankGene software 3 is shown to significantly improve performance on the testing data. Further, we show that the choice of feature selection criteria can have a significant effect on accuracy. The evolutionary algorithm is shown to perform stably across the space of possible parameter settings - indicating the robustness of the approach. We assess performance using a low variance estimation technique, and present an analysis of the genes most often selected as predictors.

CONCLUSION:

The computational methods we have developed perform robustly and accurately, and yield results in accord with clinical knowledge: A Z-score analysis of the genes most frequently selected identifies genes known to discriminate AML and Pre-T ALL leukemia. This study also confirms that significantly different sets of genes are found to be most discriminatory as the sample classes are refined.

PMID:
15958165
PMCID:
PMC1181625
DOI:
10.1186/1471-2105-6-148
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center