ONLINE GIBBS LEARNING - I - GENERAL-THEORY

Citation
H. Sompolinsky et Jw. Kim, ONLINE GIBBS LEARNING - I - GENERAL-THEORY, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics, 58(2), 1998, pp. 2335-2347
Citations number
47
Categorie Soggetti
Physycs, Mathematical","Phsycs, Fluid & Plasmas
ISSN journal
1063651X
Volume
58
Issue
2
Year of publication
1998
Part
B
Pages
2335 - 2347
Database
ISI
SICI code
1063-651X(1998)58:2<2335:OGL-I->2.0.ZU;2-4
Abstract
We study a model of on-line learning, the on-line Gibbs algorithm (OLG A), which is particularly suitable for supervised learning of realizab le and unrealizable discrete valued functions. The learning algorithm is based on an on-line energy function E that balances the need to min imize the instantaneous error against the need to minimize the change in the weights. The relative importance of these terms in E is determi ned by a parameter lambda, the inverse of which plays a similar role a s the learning rate parameters in other on-line learning algorithms. I n the stochastic version of the algorithm, following each presentation of a labeled example the new weights are chosen from a Gibbs distribu tion with the on-line energy E and a temperature parameter T. In the z ero-temperature version of OLGA, at each step, the new weights are tho se that minimize the on-line energy E. The generalization performance of OLGA is studied analytically in the limit of small learning rate, i .e., lambda-->infinity. It is shown that at finite temperature OLGA co nverges to an equilibrium Gibbs distribution of weight vectors with an energy function which is equal to the generalization error function. In its-deterministic version OLGA converges to a local minimum of the generalization error. The rate of convergence is studied for the case in which the parameter lambda increases as a power law in time. It is shown that in a generic unrealizable dichotomy, the asymptotic rate of decrease of the generalization error is proportional to the inverse s quare root of presented examples. This rate is similar to that of batc h learning. The special cases of learning realizable dichotomies or di chotomies with output noise are also treated.