Data clustering: A review

Citation
Ak. Jain et al., Data clustering: A review, ACM C SURV, 31(3), 1999, pp. 264-323
Citations number
204
Categorie Soggetti
Computer Science & Engineering
Journal title
ACM COMPUTING SURVEYS
ISSN journal
03600300 → ACNP
Volume
31
Issue
3
Year of publication
1999
Pages
264 - 323
Database
ISI
SICI code
0360-0300(199909)31:3<264:DCAR>2.0.ZU;2-1
Abstract
Clustering is the unsupervised classification of patterns (observations, da ta items, or feature vectors) into groups (clusters). The clustering proble m has been addressed in many contexts and by researchers in many discipline s; this reflects its broad appeal and usefulness as one of the steps in exp loratory data analysis. However, clustering is a difficult problem combinat orially, and differences in assumptions and contexts in different communiti es has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods fro m a statistical pattern recognition perspective, with a goal of providing u seful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as imag e segmentation, object recognition, and information retrieval.