Clustering based on conditional distributions in an auxiliary space

Citation
J. Sinkkonen et S. Kaski, Clustering based on conditional distributions in an auxiliary space, NEURAL COMP, 14(1), 2002, pp. 217-239
Citations number
34
Categorie Soggetti
Neurosciences & Behavoir","AI Robotics and Automatic Control
Journal title
NEURAL COMPUTATION
ISSN journal
08997667 → ACNP
Volume
14
Issue
1
Year of publication
2002
Pages
217 - 239
Database
ISI
SICI code
0899-7667(200201)14:1<217:CBOCDI>2.0.ZU;2-9
Abstract
We study the problem of learning groups or categories that are local in the continuous primary space but homogeneous by the distributions of an associ ated auxiliary random variable over a discrete auxiliary space. Assuming th at variation in the auxiliary space is meaningful, categories will emphasiz e similarly meaningful aspects of the primary space. From a data set consis ting of pairs of primary and auxiliary items, the categories are learned by minimizing a Kullback-Leibler divergence-based distortion between (implici tly estimated) distributions of the auxiliary data, conditioned on the prim ary data. Still, the categories are defined in terms of the primary space. An online algorithm resembling the traditional Hebb-type competitive learni ng is introduced for learning the categories. Minimizing the distortion cri terion turns out to be equivalent to maximizing the mutual information betw een the categories and the auxiliary data. In addition, connections to dens ity estimation and to the distributional clustering paradigm are outlined. The method is demonstrated by clustering yeast gene expression data from DN A chips, with biological knowledge about the functional classes of the gene s as the auxiliary data.