In this paper, we aim to compare empirically four initialization methods fo
r the K-Means algorithm: random, Forgy, MacQueen and Kaufman. Although this
algorithm is known for its robustness, it is widely reported in the litera
ture that its performance depends upon two key points: initial clustering a
nd instance order. We conduct a series of experiments to draw up tin terms
of mean, maximum, minimum and standard deviation) the probability distribut
ion of the square-error values of the final clusters returned by the K-Mean
s algorithm independently on any initial clustering and on any instance ord
er when each of the four initialization methods is used. The results of our
experiments illustrate that the random and the Kaufman initialization meth
ods outperform the rest of the compared methods as they make the K-Means mo
re effective and more independent on initial clustering and on instance ord
er. In addition, we compare the convergence speed of the K-Means algorithm
when using each of the four initialization methods. Our results suggest tha
t the Kaufman initialization method induces to the K-Means algorithm a more
desirable behaviour with respect to the convergence speed than the random
initialization method. (C) 1999 Elsevier Science B.V. All rights reserved.