Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
Electron Commer Res Appl. 2012 Mar;11(2):152-158.

RSQRT: AN HEURISTIC FOR ESTIMATING THE NUMBER OF CLUSTERS TO REPORT.

Author information

  • 1Computer Science and Engineering; University of Minnesota, 4-192 Keller Hall; 200 Union St. SE, Minneapolis, MN USA 55455, Tel: 612-625-6092;

Abstract

Clustering can be a valuable tool for analyzing large datasets, such as in e-commerce applications. Anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter. Elsewhere we introduced a strongly-supported heuristic, RSQRT, which predicts K as a function of the attribute or item count, depending on attribute scales. We conducted a second analysis where we sought confirmation of the heuristic, analyzing data sets from theUCImachine learning benchmark repository. For the 25 studies where sufficient detail was available, we again found strong support. Also, in a side-by-side comparison of 28 studies, RSQRT best-predicted K and the Bayesian information criterion (BIC) predicted K are the same. RSQRT has a lower cost of O(log log n) versus O(n(2)) for BIC, and is more widely applicable. Using RSQRT prospectively could be much better than merely guessing.

PMID:
22773923
[PubMed]
PMCID:
PMC3388514
Free PMC Article

Images from this publication.See all images (7)Free text

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for PubMed Central
    Loading ...
    Write to the Help Desk