ITA
ENG

A cautionary note on using internal cross validation to select the number of clusters

Authors

Krieger, AM Green, PE

Citation

Am. Krieger et Pe. Green, A cautionary note on using internal cross validation to select the number of clusters, PSYCHOMETRI, 64(3), 1999, pp. 341-353

Citations number

Categorie Soggetti

Psycology

Journal title

PSYCHOMETRIKA

ISSN journal

00333123 → ACNP

Volume

Issue

Year of publication

1999

Pages

341 - 353

Database

ISI

SICI code

0033-3123(199909)64:3<341:ACNOUI>2.0.ZU;2-A

Abstract

A highly popular method for examining the stability of a data clustering is to split the data into two parts, cluster the observations in Part A, assi gn the objects in Part B to their nearest centroid in Part A, and then inde pendently cluster the Part B objects. One then examines how close the two p artitions are (say, by the Rand measure). Another proposal is to split the data into k parts, and see how their centroids cluster. By means of synthet ic data analyses, we demonstrate that these approaches fail to identify the appropriate number of clusters, particularly as sample size becomes large and the variables exhibit higher correlations.