ITA
ENG

An experimental comparison of model-based clustering methods

Authors

Meila, M Heckerman, D

Citation

M. Meila et D. Heckerman, An experimental comparison of model-based clustering methods, MACH LEARN, 42(1-2), 2001, pp. 9-29

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

MACHINE LEARNING

ISSN journal

08856125 → ACNP

Volume

Issue

1-2

Year of publication

2001

Pages

9 - 29

Database

ISI

SICI code

0885-6125(200101)42:1-2<9:AECOMC>2.0.ZU;2-M

Abstract

We compare the three basic algorithms for model-based clustering on high-di mensional discrete-variable datasets. All three algorithms use the same und erlying model: a naive-Bayes model with a hidden root node, also known as a multinomial-mixture model. In the first part of the paper, we perform an e xperimental comparison between three batch algorithms that learn the parame ters of this model: the Expectation-Maximization (EM) algorithm, a "winner take all" version of the EM algorithm reminiscent of the K-means algorithm, and model-based agglomerative clustering. We find that the EM algorithm si gnificantly outperforms the other methods, and proceed to investigate the e ffect of various initialization methods on the final solution produced by t he EM algorithm. The initializations that we consider are (1) parameters sa mpled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of agglomerative clustering. Although the methods are substantially different, they lead to learned mode ls that are similar in quality.