K-modes clustering

Citation
A. Chaturvedi et al., K-modes clustering, J CLASSIF, 18(1), 2001, pp. 35-55
Citations number
16
Categorie Soggetti
Library & Information Science
Journal title
JOURNAL OF CLASSIFICATION
ISSN journal
01764268 → ACNP
Volume
18
Issue
1
Year of publication
2001
Pages
35 - 55
Database
ISI
SICI code
0176-4268(2001)18:1<35:KC>2.0.ZU;2-1
Abstract
We present a nonparametric approach to deriving clusters from categorical ( nominal scale) data using a new clustering procedure called K-modes, which is analogous to the traditional K-Means procedure (MacQueen 1967) for clust ering interval scale data. Unlike most existing methods for clustering nomi nal scale data, the K-modes procedure explicitly optimizes a loss function based on the Lo norm (defined as the limit of an L-p norm as p approaches z ero). In Monte Carlo simulations, both K-modes and latent class procedures (e.g., Goodman 1974) performed with equal efficiency in recovering a known underl ying cluster structure. However, K-modes is an order of magnitude faster th an the latent class procedure in speed and suffers from fewer problems of l ocal optima than do latent class procedures. For data sets involving a larg e number of categorical variables, latent class procedures become computati onally extremely slow and hence infeasible. We conjecture that, although in some cases latent class procedures might pe rform better than K-modes, it could out-perform latent class procedures in other cases. Hence, we recommend that these two approaches be used as "comp lementary" procedures in performing cluster analysis. We also present an em pirical comparison of K-modes and latent class, where the former method pre vails.