An empirical comparison of four initialization methods for the K-Means algorithm

Citation
Jm. Pena et al., An empirical comparison of four initialization methods for the K-Means algorithm, PATT REC L, 20(10), 1999, pp. 1027-1040
Citations number
29
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
PATTERN RECOGNITION LETTERS
ISSN journal
01678655 → ACNP
Volume
20
Issue
10
Year of publication
1999
Pages
1027 - 1040
Database
ISI
SICI code
0167-8655(199910)20:10<1027:AECOFI>2.0.ZU;2-N
Abstract
In this paper, we aim to compare empirically four initialization methods fo r the K-Means algorithm: random, Forgy, MacQueen and Kaufman. Although this algorithm is known for its robustness, it is widely reported in the litera ture that its performance depends upon two key points: initial clustering a nd instance order. We conduct a series of experiments to draw up tin terms of mean, maximum, minimum and standard deviation) the probability distribut ion of the square-error values of the final clusters returned by the K-Mean s algorithm independently on any initial clustering and on any instance ord er when each of the four initialization methods is used. The results of our experiments illustrate that the random and the Kaufman initialization meth ods outperform the rest of the compared methods as they make the K-Means mo re effective and more independent on initial clustering and on instance ord er. In addition, we compare the convergence speed of the K-Means algorithm when using each of the four initialization methods. Our results suggest tha t the Kaufman initialization method induces to the K-Means algorithm a more desirable behaviour with respect to the convergence speed than the random initialization method. (C) 1999 Elsevier Science B.V. All rights reserved.