A comparison of cluster validity criteria for a mixture of normal distributed data

Citation
Ab. Geva et al., A comparison of cluster validity criteria for a mixture of normal distributed data, PATT REC L, 21(6-7), 2000, pp. 511-529
Citations number
14
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
PATTERN RECOGNITION LETTERS
ISSN journal
01678655 → ACNP
Volume
21
Issue
6-7
Year of publication
2000
Pages
511 - 529
Database
ISI
SICI code
0167-8655(200006)21:6-7<511:ACOCVC>2.0.ZU;2-P
Abstract
Many validity criteria have been proposed over the years in order to valida te clustering of unlabeled data sets. In this research we compared the perf ormance of several known validity criteria to several new validity criteria for a mixture of normally distributed data. The main group of the new crit eria includes modifications of the Gath and Geva partition and average dens ity criteria while one new criterion is based on the generalized Neyman-Pea rson (GNP) test for normality. The comparison was performed by using simula ted Gaussian data sets, which were built from 1 to 5 clusters in 1-4 dimens ions with a variety of clusters means and variances. The clustering process was implemented by the unsupervised optimal fuzzy clustering (UOFC) algori thm that combines the fuzzy c-means (FCM) algorithm and a fuzzy modificatio n of the maximum likelihood estimation algorithm (FMLE). We conclude that i n general, there is no single validity criterion that consistently performe d much better than the others under all conditions, but nevertheless we can state clearly that some of the new validity criteria showed advantages in validating most of the simulated Gaussian data sets. (C) 2000 Elsevier Scie nce B.V. All rights reserved.