A theory of proximity based clustering: structure detection by optimization

Citation
J. Puzicha et al., A theory of proximity based clustering: structure detection by optimization, PATT RECOG, 33(4), 2000, pp. 617-634
Citations number
36
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
PATTERN RECOGNITION
ISSN journal
00313203 → ACNP
Volume
33
Issue
4
Year of publication
2000
Pages
617 - 634
Database
ISI
SICI code
0031-3203(200004)33:4<617:ATOPBC>2.0.ZU;2-F
Abstract
In this paper, a systematic optimization approach for clustering proximity or similarity data is developed. Starting from Fundamental invariance and r obustness properties, a set of axioms is proposed and discussed to distingu ish different cluster compactness and separation criteria. The approach cov ers the case of sparse proximity matrices, and is extended to nested partit ionings for hierarchical data clustering. To solve the associated optimizat ion problems, a rigorous mathematical framework for deterministic annealing and mean-field approximation is presented. Efficient optimization heuristi cs are derived in a canonical way, which also clarifies the relation to sto chastic optimization by Gibbs sampling. Similarity-based clustering techniq ues have a broad range of possible applications in computer vision, pattern recognition, and data analysis. As a major practical application we presen t a novel approach to the problem of unsupervised texture segmentation, whi ch relies on statistical tests as a measure of homogeneity. The quality of the algorithms is empirically evaluated on a large collection of Brodatz-li ke micro-texture Mondrians and on a set of real-word images. To demonstrate the broad usefulness of the theory of proximity based clustering the perfo rmances of different criteria and algorithms are compared on an information retrieval task for a document database. The superiority of optimization al gorithms for clustering is supported by extensive experiments. (C) 2000 Pat tern Recognition Society. Published by Elsevier Science Ltd. All rights res erved.