Si. Ahson et R. Srinivas, LEARNING ACTION PROBABILITIES FROM DELAYED REINFORCEMENT, International Journal of Systems Science, 24(12), 1993, pp. 2415-2421
Citations number
8
Categorie Soggetti
System Science","Computer Applications & Cybernetics","Operatione Research & Management Science
A reinforcement scheme for learning automata, applicable to real situa
tions where the reinforcement received from the environment is delayed
, is presented. The scheme divides the state space into regions follow
ing the boxes approach of Michie and Chambers. Each region maintains e
stimates of the reward characteristics of the environment and contains
a local automaton that updates action probabilities whenever the syst
em state enters it. Estimates of reward characteristics are obtained u
sing reinforcement received during the period of eligibility. Results
obtained through computer simulation of the inverted pendulum problem
are compared with the adaptive critic learning developed by Barto et a
l. (1983).