Display Settings:

Format

Send to:

Choose Destination

    BMC Bioinformatics. 2005 Jun 15;6:148.

    Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes.

    Jirapech-Umpai T, Aitken S.

    School of Informatics, The University of Edinburgh, Edinburgh EH8 9LE, United Kingdom. thanya@eng.cmu.ac.th

    BACKGROUND: In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golub et al. 1 and the NCI60 dataset of Ross et al. 2 present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. We also examine the initial gene selection step whereby the most informative genes are selected from the genes assayed. RESULTS: In the absence of feature selection, classification accuracy on the training data is typically good, but not replicated on the testing data. Gene selection using the RankGene software 3 is shown to significantly improve performance on the testing data. Further, we show that the choice of feature selection criteria can have a significant effect on accuracy. The evolutionary algorithm is shown to perform stably across the space of possible parameter settings - indicating the robustness of the approach. We assess performance using a low variance estimation technique, and present an analysis of the genes most often selected as predictors. CONCLUSION: The computational methods we have developed perform robustly and accurately, and yield results in accord with clinical knowledge: A Z-score analysis of the genes most frequently selected identifies genes known to discriminate AML and Pre-T ALL leukemia. This study also confirms that significantly different sets of genes are found to be most discriminatory as the sample classes are refined.

    PMID: 15958165 [PubMed - indexed for MEDLINE]

    PMCID: PMC1181625

    Supplemental Content

    Click here to read Click here to read