PHASE-TRANSITIONS IN STOCHASTIC SELF-ORGANIZING MAPS

Citation
T. Graepel et al., PHASE-TRANSITIONS IN STOCHASTIC SELF-ORGANIZING MAPS, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics, 56(4), 1997, pp. 3876-3890
Citations number
29
Categorie Soggetti
Physycs, Mathematical","Phsycs, Fluid & Plasmas
ISSN journal
1063651X
Volume
56
Issue
4
Year of publication
1997
Pages
3876 - 3890
Database
ISI
SICI code
1063-651X(1997)56:4<3876:PISSM>2.0.ZU;2-Q
Abstract
We describe the development of neighborhood-preserving stochastic maps in terms of a probabilistic clustering problem. Starting from a cost function for central clustering that incorporates distortions from cha nnel noise, we derive a soft topographic vector quantization algorithm (STVQ) which is based on the maximum entropy principle, and which max imizes the corresponding likelihood in an expectation-maximization fas hion. Among other algorithms, a probabilistic version of Kohonen's sel f-organizing map (SOM) is derived from STVQ as a computationally effic ient approximation of the E step. The foundation of STVQ in statistica l physics motivates a deterministic annealing scheme in the temperatur e parameter beta, and leads to a robust minimization algorithm of the clustering cost function. In particular, this scheme offers an alterna tive to the common stepwise shrinking of the neighborhood width in the SOM, and makes it possible to use its neighborhood function solely to encode the desired neighborhood relations between the clusters. The a nnealing in beta, which corresponds to a stepwise refinement of the re solution of representation in data space, leads to the splitting of an existing cluster representation during the ''cooling'' process. We de scribe this phase transition in terms of the covariance matrix C of th e data and the transition matrix H of the channel noise, and calculate the critical temperatures and modes as functions of the eigenvalues a nd eigenvectors of C and H. The analysis is extended to the phenomenon of the automatic selection of feature dimensions in dimension-reducin g maps, thus leading to a ''batch'' alternative to the Fokker-Planck f ormalism for on-line learning. The results provide insights into the r elation between the width of the neighborhood and the temperature para meter beta: It is shown that the phase transition which leads to the r epresentation of the excess dimensions can be triggered not only by a change in the statistics of the input data but also by an increase of beta, which corresponds to a decrease in noise level. The theoretical results are validated by numerical methods. In particular, a quantity equivalent to the heat capacity in thermodynamics is introduced to vis ualize the properties of the annealing process.