Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
Genome Biol. 2002 Jun 25;3(7):RESEARCH0036. Epub 2002 Jun 25.

A prediction-based resampling method for estimating the number of clusters in a dataset.

Author information

  • 1Division of Biostatistics, School of Public Health, University of California Berkeley, 140 Earl Warren Hall, Berkeley, CA 94720-7360, USA. sandrine@stat.berkeley.edu

Abstract

BACKGROUND:

Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-expression profiles. Two essential aspects of this clustering problem are: to estimate the number of clusters, if any, in a dataset; and to allocate tumor samples to these clusters, and assess the confidence of cluster assignments for individual samples. Here we address the first of these problems.

RESULTS:

We have developed a new prediction-based resampling method, Clest, to estimate the number of clusters in a dataset. The performance of the new and existing methods were compared using simulated data and gene-expression data from four recently published cancer microarray studies. Clest was generally found to be more accurate and robust than the six existing methods considered in the study.

CONCLUSIONS:

Focusing on prediction accuracy in conjunction with resampling produces accurate and robust estimates of the number of clusters.

PMID:
12184810
[PubMed - indexed for MEDLINE]
PMCID:
PMC126241
Free PMC Article

Images from this publication.See all images (10)Free text

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Icon for BioMed Central Icon for PubMed Central
    Loading ...
    Write to the Help Desk