ITA
ENG

DYNAMIC LEARNING RATE OPTIMIZATION OF THE BACKPROPAGATION ALGORITHM

Authors

YU XH CHEN GA CHENG SX

Citation

Xh. Yu et al., DYNAMIC LEARNING RATE OPTIMIZATION OF THE BACKPROPAGATION ALGORITHM, IEEE transactions on neural networks, 6(3), 1995, pp. 669-677

Citations number

Categorie Soggetti

Computer Application, Chemistry & Engineering","Engineering, Eletrical & Electronic","Computer Science Artificial Intelligence","Computer Science Hardware & Architecture","Computer Science Theory & Methods

Journal title

IEEE transactions on neural networks → ACNP

ISSN journal

10459227

Volume

Issue

Year of publication

1995

Pages

669 - 677

Database

ISI

SICI code

1045-9227(1995)6:3<669:DLROOT>2.0.ZU;2-7

Abstract

It has been observed by many authors that the backpropagation (BP) err or surfaces usually consist of a large amount of flat regions as well as extremely steep regions, As such, the BP algorithm with a fixed lea rning rate will be low efficient. This paper considers dynamic learnin g rate optimization of the BP algorithm using derivative information, An efficient method of deriving the first and second derivatives of th e objective function with respect to the learning rate is explored, wh ich does not involve explicit calculation of second-order derivatives in weight space, but rather uses the information gathered from the for ward and backward propagation. Several learning rate optimization appr oaches are subsequently established based on linear expansion of the a ctual outputs and line searches with acceptable descent value and Newt on-like method, respectively. Simultaneous determination of the optima l learning rate and momentum is also introduced by showing the equival ence between the momentum version BP and the conjugate gradient method . Since these approaches are constructed by simple manipulations of th e obtained derivatives, the computational and storage burden scale wit h the network size exactly like the standard BP algorithm, and the con vergence of the BP algorithm is accelerated within a remarkable reduct ion (typically by factor 10 to 50, depending upon network architecture s and applications) in the running time for the overall learning proce ss. Numerous computer simulation results are provided to support the p resent approaches.