Format

Send to:

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 2011 Apr 15;27(8):1094-100. doi: 10.1093/bioinformatics/btr074. Epub 2011 Feb 16.

Defining an informativeness metric for clustering gene expression data.

Author information

  • 1Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA. jess@jimmy.harvard.edu

Abstract

MOTIVATION:

Unsupervised 'cluster' analysis is an invaluable tool for exploratory microarray data analysis, as it organizes the data into groups of genes or samples in which the elements share common patterns. Once the data are clustered, finding the optimal number of informative subgroups within a dataset is a problem that, while important for understanding the underlying phenotypes, is one for which there is no robust, widely accepted solution.

RESULTS:

To address this problem we developed an 'informativeness metric' based on a simple analysis of variance statistic that identifies the number of clusters which best separate phenotypic groups. The performance of the informativeness metric has been tested on both experimental and simulated datasets, and we contrast these results with those obtained using alternative methods such as the gap statistic.

AVAILABILITY:

The method has been implemented in the Bioconductor R package attract; it is also freely available from http://compbio.dfci.harvard.edu/pubs/attract_1.0.1.zip.

CONTACT:

jess@jimmy.harvard.edu; johnq@jimmy.harvard.edu

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

PMID:
21330289
[PubMed - indexed for MEDLINE]
PMCID:
PMC3072547
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for HighWire Icon for PubMed Central
    Loading ...
    Write to the Help Desk