ITA
ENG

MODIFIED QUASI-NEWTON METHODS FOR TRAINING NEURAL NETWORKS

Authors

ROBITAILLE B MARCOS B VEILLETTE M PAYRE G

Citation

B. Robitaille et al., MODIFIED QUASI-NEWTON METHODS FOR TRAINING NEURAL NETWORKS, Computers & chemical engineering, 20(9), 1996, pp. 1133-1140

Citations number

Categorie Soggetti

Computer Application, Chemistry & Engineering","Engineering, Chemical","Computer Science Interdisciplinary Applications

Journal title

Computers & chemical engineering → ACNP

ISSN journal

00981354

Volume

Issue

Year of publication

1996

Pages

1133 - 1140

Database

ISI

SICI code

0098-1354(1996)20:9<1133:MQMFTN>2.0.ZU;2-G

Abstract

The backpropagation algorithm is the most popular procedure to train s elf-learning feedforward neural networks. However, the convergence of this algorithm is slow, it being mainly a steepest descent method. Sev eral researchers have proposed other approaches to improve the converg ence: conjugate gradient methods, dynamic modification of learning par ameters, quasi-Newton or Newton methods, stochastic methods, etc. Quas i-Newton methods were criticized because they require significant comp utation time and memory space to perform the update of the Hessian mat rix limiting their use to middle-sized problems. This paper proposes t hree variations of the classical approach of the quasi-Newton method t hat take into account the structure of the network. By neglecting some second-order interactions, the sizes of the resulting approximated He ssian matrices are not proportional to the square of the total number of weights in the network but depend on the number of neurons of each level. The modified quasi-Newton methods are tested on two examples an d are compared to classical approaches like regular quasi-Newton metho ds, backpropagation and conjugate gradient methods. The numerical resu lts show that one of these approaches, named BFGS-N, represents a clea r gain in terms of computational time, on large-scale problems, over t he traditional methods without the requirement of large memory space. (C) 1996 Elsevier Science Ltd