Display Settings:


Send to:

Choose Destination
See comment in PubMed Commons below
Genome Biol. 2002 Jun 25;3(7):RESEARCH0036. Epub 2002 Jun 25.

A prediction-based resampling method for estimating the number of clusters in a dataset.

Author information

  • 1Division of Biostatistics, School of Public Health, University of California Berkeley, 140 Earl Warren Hall, Berkeley, CA 94720-7360, USA. sandrine@stat.berkeley.edu



Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-expression profiles. Two essential aspects of this clustering problem are: to estimate the number of clusters, if any, in a dataset; and to allocate tumor samples to these clusters, and assess the confidence of cluster assignments for individual samples. Here we address the first of these problems.


We have developed a new prediction-based resampling method, Clest, to estimate the number of clusters in a dataset. The performance of the new and existing methods were compared using simulated data and gene-expression data from four recently published cancer microarray studies. Clest was generally found to be more accurate and robust than the six existing methods considered in the study.


Focusing on prediction accuracy in conjunction with resampling produces accurate and robust estimates of the number of clusters.

[PubMed - indexed for MEDLINE]
Free PMC Article

Images from this publication.See all images (10)Free text

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Icon for BioMed Central Icon for PubMed Central
    Loading ...
    Write to the Help Desk