ITA
ENG

K-modes clustering

Authors

Chaturvedi, A Green, PE Carroll, JD

Citation

A. Chaturvedi et al., K-modes clustering, J CLASSIF, 18(1), 2001, pp. 35-55

Citations number

Categorie Soggetti

Library & Information Science

Journal title

JOURNAL OF CLASSIFICATION

ISSN journal

01764268 → ACNP

Volume

Issue

Year of publication

2001

Pages

35 - 55

Database

ISI

SICI code

0176-4268(2001)18:1<35:KC>2.0.ZU;2-1

Abstract

We present a nonparametric approach to deriving clusters from categorical ( nominal scale) data using a new clustering procedure called K-modes, which is analogous to the traditional K-Means procedure (MacQueen 1967) for clust ering interval scale data. Unlike most existing methods for clustering nomi nal scale data, the K-modes procedure explicitly optimizes a loss function based on the Lo norm (defined as the limit of an L-p norm as p approaches z ero). In Monte Carlo simulations, both K-modes and latent class procedures (e.g., Goodman 1974) performed with equal efficiency in recovering a known underl ying cluster structure. However, K-modes is an order of magnitude faster th an the latent class procedure in speed and suffers from fewer problems of l ocal optima than do latent class procedures. For data sets involving a larg e number of categorical variables, latent class procedures become computati onally extremely slow and hence infeasible. We conjecture that, although in some cases latent class procedures might pe rform better than K-modes, it could out-perform latent class procedures in other cases. Hence, we recommend that these two approaches be used as "comp lementary" procedures in performing cluster analysis. We also present an em pirical comparison of K-modes and latent class, where the former method pre vails.