19640299[PMID] - PMC

Figure 8. From: CLEAN: CLustering Enrichment ANalysis.

Expression patterns of genes with statistically significant CLEAN scores in four independent breast cancer datasets. The heatmap indicates that all genes belong to clusters with coherent expression patterns in each dataset. Functional categories on the right-hand side indicate the enriched functional categories for each global cluster of co-expressed genes. This heatmap can be interactively browsed using FTreeView at .

Johannes M Freudenberg, et al. BMC Bioinformatics. 2009;10:234-234.

Citation Full text

Figure 1. From: CLEAN: CLustering Enrichment ANalysis.

Calculating functional coherence scores. Given a hierarchical clustering of genes based on their expression profiles and a set of functional categories (e.g. Gene Ontologies), the CLustering Enrichment ANalysis (CLEAN) score for a gene is calculated as the maximum of -log(Fisher's Exact Test q-value) of enrichment tests across all pairs of clusters containing the gene and functional categories containing the gene (see methods for details). The Cluster-wide CLEAN score (cwCLEAN) is calculated in a similar fashion except that the maximum is taken over all clusters that contain the gene and all functional categories regardless of whether they contain the gene or not.

Johannes M Freudenberg, et al. BMC Bioinformatics. 2009;10:234-234.

Citation Full text

Figure 4. From: CLEAN: CLustering Enrichment ANalysis.

Reproducibility of CLEAN and cwCLEAN scores. The reproducibility of the functional coherence results for 6 different clustering algorithms was assessed by calculating all pairwise Pearson's correlation coefficients between scores for all algorithms applied to four independent human breast cancer datasets (GEO expression series GSE1456 [], GSE3494 [], GSE7390 [], and GSE11121 []). Rows and columns in this symmetric heatmap represent specific scores for a specific clustering in a specific dataset in the heatmap. The symmetric hierarchical clustering of rows and columns was constructed using pairwise Pearson's correlations between different scores as the similarity measures and applying the complete linkage principle.

Johannes M Freudenberg, et al. BMC Bioinformatics. 2009;10:234-234.

Citation Full text

Figure 5. From: CLEAN: CLustering Enrichment ANalysis.

Differences in the reproducibility of CLEAN and cwCLEAN scores. Improvements in the reproducibility of CLEAN over cwCLEAN scores were demonstrated by box plots of differences in correlation coefficients, and odds ratios and p-values in 2-by-2 contingency tables of statistically significant scores. A) Box plots of differences in correlations between CLEAN and cwCLEAN scores of all 6 pairs of breast cancer datasets for three different clustering algorithms. Since all differences are positive, this indicates that the correlation coefficient was higher for CLEAN scores in each of the 6 pairs. B) Box plots of differences in odds ratios for 2-by-2 contingency tables of statistically significant CLEAN and cwCLEAN scores for all 6 pairs of breast cancer datasets and three different clustering algorithms. All differences are positive implicating higher reproducibility of CLEAN scores. C) Box plots of differences in the statistical significances in (-log₁₀(p-values)) in the Fisher's Exact test for the same contingency tables as in B). The fact that all differences are positive again implicates higher reproducibility of CLEAN scores.

Johannes M Freudenberg, et al. BMC Bioinformatics. 2009;10:234-234.

Citation Full text

Figure 3. From: CLEAN: CLustering Enrichment ANalysis.

Integrating cluster analysis and functional knowledge. Genes were clustered using the CSIMM [] algorithm and variance-scaled data from two independent breast cancer datasets (GSE3494 [] and GSE7390 []), and CLEAN scores were computed for both clusterings. The number of genes common in both datasets after filtering was 8,567. A) The gene-specific CLEAN scores for the two datasets were plotted against each other and the Pearson's correlation coefficient was computed. A small error was added in the scatter plot to better visualize overlapping data points. B) Pairwise similarity measures between genes computed by CSIMM were also plotted and correlated. C) Expression profiles of genes with the very highest CLEAN scores in both datasets showed strong co-expression in both datasets. All genes in this cluster are immunity related.

Johannes M Freudenberg, et al. BMC Bioinformatics. 2009;10:234-234.

Citation Full text

Figure 2. From: CLEAN: CLustering Enrichment ANalysis.

Comparison of clustering methods. We compared functional coherence of six clustering algorithms: Context specific infinite mixture model (CSIMM), Euclidian distance based and Pearson's correlation based hierarchical clustering with and without prior variance-rescaling of the data, across four independent human breast cancer datasets (GEO expression series GSE1456 [], GSE3494 [], GSE7390 [], and GSE11121 []). For all six algorithms, the hierarchical clustering was constructed using the average linkage principle. The number of genes common in all four datasets after filtering was 6,150. CLEAN scores are plotted against the x-axis and the corresponding number of genes with the CLEAN greater than this are plotted against the y-axis. Higher areas under the curve imply the higher functional coherence.

Johannes M Freudenberg, et al. BMC Bioinformatics. 2009;10:234-234.

Citation Full text

Figure 6. From: CLEAN: CLustering Enrichment ANalysis.

Unsupervised selection of informative genes. Genes were clustered based on their expression across different tissue samples and functional coherence scores are calculated for the human and mouse datasets separately. Ability of different groups of genes to facilitate correct grouping of samples from the same tissue type in the combined human-mouse dataset was assessed by constructing ROC curves. The ROC curve for clustering samples based on all 10,287 genes is inserted in each plot (red line) for the reference. A) ROC curves for clustering samples based on genes with the statistically significant CLEAN scores in both mouse and human datasets, and genes not statistically significant in either of the datasets. B) Same as A) for the cwCLEAN instead of CLEAN scores. C) ROC curves based on genes selected using COPA. The number of selected genes was identical to the number of genes with statistically significant CLEAN scores used in A).

Johannes M Freudenberg, et al. BMC Bioinformatics. 2009;10:234-234.

Citation Full text

Figure 7. From: CLEAN: CLustering Enrichment ANalysis.

Integrated software package. CLEAN was implemented as an add-on R package []. The package integrates routines for calculating gene specific functional coherence scores and the interactive Java-based viewer Functional TreeView (FTreeView). The figure shows a screenshot of the fTreeView session displaying CLEAN results for one breast cancer dataset GSE3494 []. fTreeView was developed from the original Java TreeView [] by adding panel 3, which displays functional cluster annotations generated by the CLEAN R package. This functionality enables seamless integration and browsing of functional categories associated with each cluster of genes (panel 2), which in turn can be selected based on the functional coherence scores (panel 1). The selected cluster of genes (panel 2) which we identified based on the overall high CLEAN scores (panel 1) is highly enriched for genes associate with immunity related Gene Ontology terms (FDR < 10^-60) as well as two KEGG pathways, and putative targets of the Interferon Consensus Sequence-binding protein (ICSBP) transcription factor. These Results can be viewed interactively at using the Java web-start version of FTreeView.

Johannes M Freudenberg, et al. BMC Bioinformatics. 2009;10:234-234.

Citation Full text

PMC

Result Filters

Article attributes

Text availability

Publication date

Custom date range

Research Funder

Additional filters

Display Settings:

PMC Full-Text Search Results

Items: 8

Display Settings:

Supplemental Content

Recent activity