ANALYSIS OF ONLINE TRAINING WITH OPTIMAL LEARNING RATES

Authors
Citation
M. Rattray et D. Saad, ANALYSIS OF ONLINE TRAINING WITH OPTIMAL LEARNING RATES, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics, 58(5), 1998, pp. 6379-6391
Citations number
18
Categorie Soggetti
Physycs, Mathematical","Phsycs, Fluid & Plasmas
ISSN journal
1063651X
Volume
58
Issue
5
Year of publication
1998
Part
B
Pages
6379 - 6391
Database
ISI
SICI code
1063-651X(1998)58:5<6379:AOOTWO>2.0.ZU;2-Y
Abstract
We describe a theoretical method of determining optimal learning rates for on-line gradient descent training of a multilayer neural network (a soft committee machine). A variational approach is used to determin e the time-dependent learning rate which maximizes the total decrease in generalization error over a fixed time window, using a statistical mechanics description of the learning process which is exact in the li mit of large input dimension. A linear analysis around transient and a symptotic fixed points of the dynamics provides insight into the optim ization process and explains the excellent agreement between our resul ts and independent results for isotropic, realizable tasks. This allow s a rather general characterization of the optimal learning rate dynam ics within each phase of learning (we discuss scaling laws with respec t to task complexity in particular). Our method can also be used to op timize other parameters and learning rules, and we briefly consider a generalized algorithm in which weights associated with different hidde n nodes can be assigned different learning rates. The optimal settings in this case suggest that such an algorithm can significantly outperf orm standard gradient descent. [S1063-651X(98)08511-0].