Mal. Thathachar et Vv. Phansalkar, LEARNING THE GLOBAL MAXIMUM WITH PARAMETERIZED LEARNING AUTOMATA, IEEE transactions on neural networks, 6(2), 1995, pp. 398-406
A feedforward network composed of units of teams of parameterized lear
ning automata is considered as a model of a reinforcement teaming syst
em. The internal state vector of each learning automaton is updated us
ing an algorithm consisting of a gradient following term and a random
perturbation term. It is shown that the algorithm weakly converges to
a solution of the Langevin equation implying that the algorithm global
ly maximizes an appropriate function. The algorithm is decentralized,
and the units do not have any information exchange during updating. Si
mulation results on common payoff games and pattern recognition proble
ms show that reasonable rates of convergence can be obtained.