ITA
ENG

LEARNING LONG-TERM DEPENDENCIES IN NARX RECURRENT NEURAL NETWORKS

Authors

LIN TN HORNE BG TINO P GILES CL

Citation

Tn. Lin et al., LEARNING LONG-TERM DEPENDENCIES IN NARX RECURRENT NEURAL NETWORKS, IEEE transactions on neural networks, 7(6), 1996, pp. 1329-1338

Citations number

Categorie Soggetti

Computer Application, Chemistry & Engineering","Engineering, Eletrical & Electronic","Computer Science Artificial Intelligence","Computer Science Hardware & Architecture","Computer Science Theory & Methods

Journal title

IEEE transactions on neural networks → ACNP

ISSN journal

10459227

Volume

Issue

Year of publication

1996

Pages

1329 - 1338

Database

ISI

SICI code

1045-9227(1996)7:6<1329:LLDINR>2.0.ZU;2-#

Abstract

It has recently been shown that gradient-descent learning algorithms f or recurrent neural networks can perform poorly on tasks that involve long-term dependencies, i.e., those problems for which the desired out put depends on inputs presented at times far in the past, We show that the long-term dependencies problem is lessened fora class of architec tures called Nonlinear AutoRegressive models with eXogenous (NARX) rec urrent neural networks, which have powerful representational capabilit ies, We have previously reported that gradient descent learning can be more effective in NARX networks than in recurrent neural network arch itectures that have ''hidden states'' on problems including grammatica l inference and nonlinear system identification. Typically, the networ k converges much faster and generalizes better than other networks. Th e results in this paper are consistent with this phenomenon, We presen t some experimental results which show that NARX networks can often re tain information for two to three times as long as conventional recurr ent neural networks. We show that although NARX networks do not circum vent the problem of long-term dependencies, they can greatly improve p erformance on longterm dependency problems, We also describe in detail some of the assumption regarding what it means to latch information r obustly and suggest possible ways to loosen these assumptions.