REINFORCEMENT LEARNING APPLIED TO A DIFFERENTIAL CAME

Citation
Me. Harmon et al., REINFORCEMENT LEARNING APPLIED TO A DIFFERENTIAL CAME, Adaptive behavior, 4(1), 1995, pp. 3-28
Citations number
18
Categorie Soggetti
Social, Sciences, Interdisciplinary","Psychology, Experimental
Journal title
ISSN journal
10597123
Volume
4
Issue
1
Year of publication
1995
Pages
3 - 28
Database
ISI
SICI code
1059-7123(1995)4:1<3:RLATAD>2.0.ZU;2-R
Abstract
An application of reinforcement learning to a linear-quadratic, differ ential game is presented The reinforcement learning system uses a rece ntly developed algorithm, She residual-gradient form of advantage upda ting The game is a Markov decision process with continuous time, state s, and actions, linear dynamics, and a quadratic cost function. The ga me consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and pl ane scenario was the chosen test bed, the reinforcement learning appro ach presented here is equally applicable to biologically based systems , such as a predator pursuing prey. The reinforcement learning algorit hm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforc ement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of adva ntage updating and Q-learning are compared, demonstrating that advanta ge updating converges faster than Q-learning in all simulations. Advan tage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step durat ion grows small.