Aitken-based acceleration methods for assessing convergence of multilayer neural networks

Citation
Rs. Pilla et al., Aitken-based acceleration methods for assessing convergence of multilayer neural networks, IEEE NEURAL, 12(5), 2001, pp. 998-1012
Citations number
41
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
IEEE TRANSACTIONS ON NEURAL NETWORKS
ISSN journal
10459227 → ACNP
Volume
12
Issue
5
Year of publication
2001
Pages
998 - 1012
Database
ISI
SICI code
1045-9227(200109)12:5<998:AAMFAC>2.0.ZU;2-N
Abstract
Suppose a nonlinear and nonquadratic objective function is being optimized over a high dimensional parameter space. Often a closed-form solution does not exist and iterative methods are employed to find a local optimum of the function. However, algorithms designed for such high-dimensional optimizat ion problems tend to be very slow in order to ensure reliable convergence b ehaviors. This problem occurs frequently, for example, in training multilay er neural networks (NNs) using a gradient-descent (backpropagation) algorit hm. Lack of measures of algorithmic convergence force one to use ad hoe cri teria to stop the training process. This paper first develops the ideas of Aitken delta (2) method to accelerate the rate of convergence of an error s equence (value of the objective function at each step) obtained by training an NN with a sigmoidal activation function via the backpropagation algorit hm. The Aitken method is exact when the error sequence is exactly geometric . However, theoretical and empirical evidence suggests that the best possib le rate of convergence obtainable for such an error sequence is log-geometr ic (an inverse power of the epoch n). The current paper develops a new inva riant extended-Aitken acceleration method for accelerating log-geometric se quences. The resulting accelerated sequence enables one to predict the fina l value of the error function. These predictions can in turn be used to ass ess the distance between the current and final solution and thereby provide s a stopping criterion for a desired accuracy. Each of the techniques descr ibed in the paper is applicable to a wide range of problems. The invariant extended-Aitken acceleration approach shows improved acceleration as well a s outstanding prediction of the final error in the practical problems consi dered.