A comprehensive evaluation of machine learning techniques for cancer class prediction based on microarray data

Int J Bioinform Res Appl. 2015;11(5):397-416. doi: 10.1504/ijbra.2015.071940.

Abstract

Prostate cancer is among the most common cancer in males and its heterogeneity is well known. The genomic level changes can be detected in gene expression data and those changes may serve as standard model for any random cancer data for class prediction. Various techniques were implied on prostate cancer data set in order to accurately predict cancer class including machine learning techniques. Large number of attributes but few numbers of samples in microarray data leads to poor training; therefore, the most challenging part is attribute reduction or non-significant gene reduction. In this work, a combination of interquartile range and t-test is used for attribute reduction. Further, a comprehensive evaluation of ten state-of-the-art machine learning techniques for their accuracy in class prediction of prostate cancer is done. Out of these techniques, Bayes Network outperformed with an accuracy of 94.11% followed by Naïve Bayes with an accuracy of 91.17%.

Keywords: Bayes network; bioinformatics; cancer class prediction; machine learning; microarray analysis; prostate cancer.