HIGH-PERFORMANCE TRAINING OF FEEDFORWARD AND SIMPLE RECURRENT NETWORKS

Citation
Bl. Kalman et Sc. Kwasny, HIGH-PERFORMANCE TRAINING OF FEEDFORWARD AND SIMPLE RECURRENT NETWORKS, Neurocomputing, 14(1), 1997, pp. 63-83
Citations number
33
Categorie Soggetti
Computer Sciences, Special Topics","Computer Science Artificial Intelligence",Neurosciences
Journal title
ISSN journal
09252312
Volume
14
Issue
1
Year of publication
1997
Pages
63 - 83
Database
ISI
SICI code
0925-2312(1997)14:1<63:HTOFAS>2.0.ZU;2-8
Abstract
TRAINREC is a system for training feedforward and recurrent neural net works that incorporates several ideas. It uses the conjugate-gradient method which is demonstrably more efficient than traditional backward error propagation. We assume epoch-based training and derive a new err or function having several desirable properties absent from the tradit ional sum-of-squares-error function, We argue for skip (shortcut) conn ections where appropriate and the preference for a bipolar sigmoidal y ielding values over the [-1, 1] interval. The input feature space is o ften over-analyzed, but by using singular value decomposition, input p atterns can be conditioned for better learning often with a reduced nu mber of input units. Recurrent networks, in their most general form, r equire special handling and cannot be simply a re-wiring of the archit ecture without a corresponding revision of the derivative calculations . There is a careful balance required among the network architecture ( specifically, hidden and feedback units), the amount of training appli ed, and the ability of the network to generalize. These issues often h inge on selecting the proper stopping criterion. Discovering methods t hat work in theory as well as in practice is difficult and we have spe nt a substantial amount of effort evaluating and testing these ideas o n real problems to determine their value, This paper encapsulates a nu mber of such ideas ranging from those motivated by a desire for effici ency of training to those motivated by correctness and accuracy of the result, While this paper is intended to be self-contained, several re ferences are provided to other work upon which many of our claims are based.