USING EXPECTATION-MAXIMIZATION FOR REINFORCEMENT LEARNING

Authors
Citation
P. Dayan et Ge. Hinton, USING EXPECTATION-MAXIMIZATION FOR REINFORCEMENT LEARNING, Neural computation, 9(2), 1997, pp. 271-278
Citations number
13
Categorie Soggetti
Computer Sciences","Computer Science Artificial Intelligence",Neurosciences
Journal title
ISSN journal
08997667
Volume
9
Issue
2
Year of publication
1997
Pages
271 - 278
Database
ISI
SICI code
0899-7667(1997)9:2<271:UEFRL>2.0.ZU;2-8
Abstract
We discuss Hinton's (1989) relative payoff procedure (RPP), a static r einforcement learning algorithm whose foundation is not stochastic gra dient ascent. We show circumstances under which applying the RPP is gu aranteed to increase the mean return, even though it can make large ch anges in the values of the parameters. The proof is based on a mapping between the RPP and a form of the expectation-maximization procedure of Dempster, Laird, and Rubin (1977).