Display Settings:


Send to:

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 2001 Dec;17(12):1143-51.

Statistical estimation of cluster boundaries in gene expression profile data.

Author information

  • 1Laboratory of Mathematics, Saga Medical School, 5-1-1 Nabeshima, Saga, Saga 849-8501, Japan. horimoto@post.saga-med.ac.jp



Gene expression profile data are rapidly accumulating due to advances in microarray techniques. The abundant data are analyzed by clustering procedures to extract the useful information about the genes inherent in the data. In the clustering analyses, the systematic determination of the boundaries of gene clusters, instead of by visual inspection and biological knowledge, still remains challenging.


We propose a statistical procedure to estimate the number of clusters in the hierarchical clustering of the expression profiles. Following the hierarchical clustering, the statistical property of the profiles at the node in the dendrogram is evaluated by a statistics-based value: the variance inflation factor in the multiple regression analysis. The evaluation leads to an automatic determination of the cluster boundaries without any additional analyses and any biological knowledge of the measured genes. The performance of the present procedure is demonstrated on the profiles of 2467 yeast genes, with very promising results.


A set of programs will be electronically sent upon request.


horimoto@post.saga-med.ac.jp; toh@beri.co.jp

[PubMed - indexed for MEDLINE]
Free full text
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Icon for HighWire
    Loading ...
    Write to the Help Desk