An experimental comparison of model-based clustering methods

Citation
M. Meila et D. Heckerman, An experimental comparison of model-based clustering methods, MACH LEARN, 42(1-2), 2001, pp. 9-29
Citations number
18
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
MACHINE LEARNING
ISSN journal
08856125 → ACNP
Volume
42
Issue
1-2
Year of publication
2001
Pages
9 - 29
Database
ISI
SICI code
0885-6125(200101)42:1-2<9:AECOMC>2.0.ZU;2-M
Abstract
We compare the three basic algorithms for model-based clustering on high-di mensional discrete-variable datasets. All three algorithms use the same und erlying model: a naive-Bayes model with a hidden root node, also known as a multinomial-mixture model. In the first part of the paper, we perform an e xperimental comparison between three batch algorithms that learn the parame ters of this model: the Expectation-Maximization (EM) algorithm, a "winner take all" version of the EM algorithm reminiscent of the K-means algorithm, and model-based agglomerative clustering. We find that the EM algorithm si gnificantly outperforms the other methods, and proceed to investigate the e ffect of various initialization methods on the final solution produced by t he EM algorithm. The initializations that we consider are (1) parameters sa mpled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of agglomerative clustering. Although the methods are substantially different, they lead to learned mode ls that are similar in quality.