ITA
ENG

Clustering based on conditional distributions in an auxiliary space

Authors

Sinkkonen, J Kaski, S

Citation

J. Sinkkonen et S. Kaski, Clustering based on conditional distributions in an auxiliary space, NEURAL COMP, 14(1), 2002, pp. 217-239

Citations number

Categorie Soggetti

Neurosciences & Behavoir","AI Robotics and Automatic Control

Journal title

NEURAL COMPUTATION

ISSN journal

08997667 → ACNP

Volume

Issue

Year of publication

2002

Pages

217 - 239

Database

ISI

SICI code

0899-7667(200201)14:1<217:CBOCDI>2.0.ZU;2-9

Abstract

We study the problem of learning groups or categories that are local in the continuous primary space but homogeneous by the distributions of an associ ated auxiliary random variable over a discrete auxiliary space. Assuming th at variation in the auxiliary space is meaningful, categories will emphasiz e similarly meaningful aspects of the primary space. From a data set consis ting of pairs of primary and auxiliary items, the categories are learned by minimizing a Kullback-Leibler divergence-based distortion between (implici tly estimated) distributions of the auxiliary data, conditioned on the prim ary data. Still, the categories are defined in terms of the primary space. An online algorithm resembling the traditional Hebb-type competitive learni ng is introduced for learning the categories. Minimizing the distortion cri terion turns out to be equivalent to maximizing the mutual information betw een the categories and the auxiliary data. In addition, connections to dens ity estimation and to the distributional clustering paradigm are outlined. The method is demonstrated by clustering yeast gene expression data from DN A chips, with biological knowledge about the functional classes of the gene s as the auxiliary data.