M. Rattray et D. Saad, ANALYSIS OF ONLINE TRAINING WITH OPTIMAL LEARNING RATES, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics, 58(5), 1998, pp. 6379-6391
We describe a theoretical method of determining optimal learning rates
for on-line gradient descent training of a multilayer neural network
(a soft committee machine). A variational approach is used to determin
e the time-dependent learning rate which maximizes the total decrease
in generalization error over a fixed time window, using a statistical
mechanics description of the learning process which is exact in the li
mit of large input dimension. A linear analysis around transient and a
symptotic fixed points of the dynamics provides insight into the optim
ization process and explains the excellent agreement between our resul
ts and independent results for isotropic, realizable tasks. This allow
s a rather general characterization of the optimal learning rate dynam
ics within each phase of learning (we discuss scaling laws with respec
t to task complexity in particular). Our method can also be used to op
timize other parameters and learning rules, and we briefly consider a
generalized algorithm in which weights associated with different hidde
n nodes can be assigned different learning rates. The optimal settings
in this case suggest that such an algorithm can significantly outperf
orm standard gradient descent. [S1063-651X(98)08511-0].