DYNAMIC LEARNING RATE OPTIMIZATION OF THE BACKPROPAGATION ALGORITHM

Citation
Xh. Yu et al., DYNAMIC LEARNING RATE OPTIMIZATION OF THE BACKPROPAGATION ALGORITHM, IEEE transactions on neural networks, 6(3), 1995, pp. 669-677
Citations number
21
Categorie Soggetti
Computer Application, Chemistry & Engineering","Engineering, Eletrical & Electronic","Computer Science Artificial Intelligence","Computer Science Hardware & Architecture","Computer Science Theory & Methods
ISSN journal
10459227
Volume
6
Issue
3
Year of publication
1995
Pages
669 - 677
Database
ISI
SICI code
1045-9227(1995)6:3<669:DLROOT>2.0.ZU;2-7
Abstract
It has been observed by many authors that the backpropagation (BP) err or surfaces usually consist of a large amount of flat regions as well as extremely steep regions, As such, the BP algorithm with a fixed lea rning rate will be low efficient. This paper considers dynamic learnin g rate optimization of the BP algorithm using derivative information, An efficient method of deriving the first and second derivatives of th e objective function with respect to the learning rate is explored, wh ich does not involve explicit calculation of second-order derivatives in weight space, but rather uses the information gathered from the for ward and backward propagation. Several learning rate optimization appr oaches are subsequently established based on linear expansion of the a ctual outputs and line searches with acceptable descent value and Newt on-like method, respectively. Simultaneous determination of the optima l learning rate and momentum is also introduced by showing the equival ence between the momentum version BP and the conjugate gradient method . Since these approaches are constructed by simple manipulations of th e obtained derivatives, the computational and storage burden scale wit h the network size exactly like the standard BP algorithm, and the con vergence of the BP algorithm is accelerated within a remarkable reduct ion (typically by factor 10 to 50, depending upon network architecture s and applications) in the running time for the overall learning proce ss. Numerous computer simulation results are provided to support the p resent approaches.