Vv. Phansalkar et Mal. Thathachar, LOCAL AND GLOBAL OPTIMIZATION ALGORITHMS FOR GENERALIZED LEARNING AUTOMATA, Neural computation, 7(5), 1995, pp. 950-973
This paper analyzes the long-term behavior of the REINFORCE and relate
d algorithms (Williams 1986, 1988, 1992) for generalized learning auto
mata (Narendra and Thathachar 1989) for the associative reinforcement
learning problem (Barto and Anandan 1985). The learning system conside
red here is a feedforward connectionist network of generalized learnin
g automata units. We show that REINFORCE is a gradient ascent algorith
m but can exhibit unbounded behavior. A modified version of this algor
ithm, based on constrained optimization techniques, is suggested to ov
ercome this disadvantage. The modified algorithm is shown to exhibit l
ocal optimization properties. A global version of the algorithm, based
on constant temperature heat bath techniques, is also described and s
hown to converge to the global maximum. All algorithms are analyzed us
ing weak convergence techniques.