An application of reinforcement learning to a linear-quadratic, differ
ential game is presented The reinforcement learning system uses a rece
ntly developed algorithm, She residual-gradient form of advantage upda
ting The game is a Markov decision process with continuous time, state
s, and actions, linear dynamics, and a quadratic cost function. The ga
me consists of two players, a missile and a plane; the missile pursues
the plane and the plane evades the missile. Although a missile and pl
ane scenario was the chosen test bed, the reinforcement learning appro
ach presented here is equally applicable to biologically based systems
, such as a predator pursuing prey. The reinforcement learning algorit
hm for optimal control is modified for differential games to find the
minimax point rather than the maximum. Simulation results are compared
to the analytical solution, demonstrating that the simulated reinforc
ement learning system converges to the optimal answer. The performance
of both the residual-gradient and non-residual-gradient forms of adva
ntage updating and Q-learning are compared, demonstrating that advanta
ge updating converges faster than Q-learning in all simulations. Advan
tage updating also is demonstrated to converge regardless of the time
step duration; Q-learning is unable to converge as the time step durat
ion grows small.