Send to

Choose Destination
Pac Symp Biocomput. 2013:212-23.

Spectral clustering strategies for heterogeneous disease expression data.

Author information

Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.


Clustering of gene expression data simplifies subsequent data analyses and forms the basis of numerous approaches for biomarker identification, prediction of clinical outcome, and personalized therapeutic strategies. The most popular clustering methods such as K-means and hierarchical clustering are intuitive and easy to use, but they require arbitrary choices on their various parameters (number of clusters for K-means, and a threshold to cut the tree for hierarchical clustering). Human disease gene expression data are in general more difficult to cluster efficiently due to background (genotype) heterogeneity, disease stage and progression differences and disease subtyping; all of which cause gene expression datasets to be more heterogeneous. Spectral clustering has been recently introduced in many fields as a promising alternative to standard clustering methods. The idea is that pairwise comparisons can help reveal global features through the eigen techniques. In this paper, we developed a new recursive K-means spectral clustering method (ReKS) for disease gene expression data. We benchmarked ReKS on three large-scale cancer datasets and we compared it to different clustering methods with respect to execution time, background models and external biological knowledge. We found ReKS to be superior to the hierarchical methods and equally good to K-means, but much faster than them and without the requirement for a priori knowledge of K. Overall, ReKS offers an attractive alternative for efficient clustering of human disease data.

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for World Scientific Publishing Company Icon for PubMed Central
Loading ...
Support Center