A cautionary note on using internal cross validation to select the number of clusters

Citation
Am. Krieger et Pe. Green, A cautionary note on using internal cross validation to select the number of clusters, PSYCHOMETRI, 64(3), 1999, pp. 341-353
Citations number
18
Categorie Soggetti
Psycology
Journal title
PSYCHOMETRIKA
ISSN journal
00333123 → ACNP
Volume
64
Issue
3
Year of publication
1999
Pages
341 - 353
Database
ISI
SICI code
0033-3123(199909)64:3<341:ACNOUI>2.0.ZU;2-A
Abstract
A highly popular method for examining the stability of a data clustering is to split the data into two parts, cluster the observations in Part A, assi gn the objects in Part B to their nearest centroid in Part A, and then inde pendently cluster the Part B objects. One then examines how close the two p artitions are (say, by the Rand measure). Another proposal is to split the data into k parts, and see how their centroids cluster. By means of synthet ic data analyses, we demonstrate that these approaches fail to identify the appropriate number of clusters, particularly as sample size becomes large and the variables exhibit higher correlations.