We compare the three basic algorithms for model-based clustering on high-di
mensional discrete-variable datasets. All three algorithms use the same und
erlying model: a naive-Bayes model with a hidden root node, also known as a
multinomial-mixture model. In the first part of the paper, we perform an e
xperimental comparison between three batch algorithms that learn the parame
ters of this model: the Expectation-Maximization (EM) algorithm, a "winner
take all" version of the EM algorithm reminiscent of the K-means algorithm,
and model-based agglomerative clustering. We find that the EM algorithm si
gnificantly outperforms the other methods, and proceed to investigate the e
ffect of various initialization methods on the final solution produced by t
he EM algorithm. The initializations that we consider are (1) parameters sa
mpled from an uninformative prior, (2) random perturbations of the marginal
distribution of the data, and (3) the output of agglomerative clustering.
Although the methods are substantially different, they lead to learned mode
ls that are similar in quality.