LEARNING ACTION PROBABILITIES FROM DELAYED REINFORCEMENT

Citation
Si. Ahson et R. Srinivas, LEARNING ACTION PROBABILITIES FROM DELAYED REINFORCEMENT, International Journal of Systems Science, 24(12), 1993, pp. 2415-2421
Citations number
8
Categorie Soggetti
System Science","Computer Applications & Cybernetics","Operatione Research & Management Science
ISSN journal
00207721
Volume
24
Issue
12
Year of publication
1993
Pages
2415 - 2421
Database
ISI
SICI code
0020-7721(1993)24:12<2415:LAPFDR>2.0.ZU;2-Q
Abstract
A reinforcement scheme for learning automata, applicable to real situa tions where the reinforcement received from the environment is delayed , is presented. The scheme divides the state space into regions follow ing the boxes approach of Michie and Chambers. Each region maintains e stimates of the reward characteristics of the environment and contains a local automaton that updates action probabilities whenever the syst em state enters it. Estimates of reward characteristics are obtained u sing reinforcement received during the period of eligibility. Results obtained through computer simulation of the inverted pendulum problem are compared with the adaptive critic learning developed by Barto et a l. (1983).