Send to

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 2008 Jun 1;24(11):1359-66. doi: 10.1093/bioinformatics/btn133. Epub 2008 Apr 10.

Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.

Author information

Department of Computer Science and Engineering, Netaji Subhash Engineering College, Garia and Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India.



Cluster analysis (of gene-expression data) is a useful tool for identifying biologically relevant groups of genes that show similar expression patterns under multiple experimental conditions. Various methods have been proposed for clustering gene-expression data. However most of these algorithms have several shortcomings for gene-expression data clustering. In the present article, we focus on several shortcomings of conventional clustering algorithms and propose a new one that is able to produce better clustering solution than that produced by some others.


We present the Divisive Correlation Clustering Algorithm (DCCA) that is suitable for finding a group of genes having similar pattern of variation in their expression values. To detect clusters with high correlation and biological significance, we use the correlation clustering concept introduced by Bansal et al. Our proposed algorithm DCCA produces a clustering solution without taking number of clusters to be created as an input. DCCA uses the correlation matrix in such a way that all genes in a cluster have highest average correlation with genes in that cluster. To test the performance of the DCCA, we have applied DCCA and some well-known conventional methods to an artificial dataset, and nine gene-expression datasets, and compared the performance of the algorithms. The clustering results of the DCCA are found to be more significantly relevant to the biological annotations than those of the other methods. All these facts show the superiority of the DCCA over some others for the clustering of gene-expression data.


The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software.

[Indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Silverchair Information Systems
    Loading ...
    Support Center