CLICK and EXPANDER: a system for clustering and visualizing gene expression data

Bioinformatics. 2003 Sep 22;19(14):1787-99. doi: 10.1093/bioinformatics/btg232.

Abstract

Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar expression patterns. This translates to the algorithmic problem of clustering genes based on their expression patterns.

Results: We present a novel clustering algorithm, called CLICK, and its applications to gene expression analysis. The algorithm utilizes graph-theoretic and statistical techniques to identify tight groups (kernels) of highly similar elements, which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clusters. We report on the application of CLICK to a variety of gene expression data sets. In all those applications it outperformed extant algorithms according to several common figures of merit. We also point out that CLICK can be successfully used for the identification of common regulatory motifs in the upstream regions of co-regulated genes. Furthermore, we demonstrate how CLICK can be used to accurately classify tissue samples into disease types, based on their expression profiles. Finally, we present a new java-based graphical tool, called EXPANDER, for gene expression analysis and visualization, which incorporates CLICK and several other popular clustering algorithms.

Availability: http://www.cs.tau.ac.il/~rshamir/expander/expander.html

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Base Sequence
  • Cell Cycle / genetics
  • Cluster Analysis*
  • Fibroblasts
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation / genetics*
  • HeLa Cells
  • Humans
  • Molecular Sequence Data
  • Oligonucleotide Array Sequence Analysis / methods
  • Pattern Recognition, Automated
  • Reproducibility of Results
  • Saccharomyces cerevisiae / genetics
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Software
  • User-Computer Interface*