Bj. Oommen et M. Agache, Continuous and discretized pursuit learning schemes: Various algorithms and their comparison, IEEE SYST B, 31(3), 2001, pp. 277-287
Citations number
26
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS
A learning automaton (LA) is an automaton that interacts with a random envi
ronment, having as its goal the task of learning the optimal action based o
n its acquired experience. Many learning automats (LAs) have been proposed,
with the class of estimator algorithms being among the fastest ones, Thath
achar and Sastry, through the pursuit algorithm, introduced the concept of
learning algorithms that pursue the current optimal action, following a rew
ard-penalty learning philosophy, Later, Oommen and Lanctot extended the pur
suit algorithm into the discretized world by presenting the discretized pur
suit algorithm, based on a reward-inaction learning philosophy. In this pap
er we argue that the reward-penalty and reward-inaction learning paradigms
in conjunction with the continuous and discrete models of computation, lead
to four versions of pursuit learning automata, We contend that a scheme th
at merges the pursuit concept with the most recent response of the environm
ent, permits the algorithm to utilize the LAs long-term and short-term pers
pectives of the environment. In this paper, we present all four resultant p
ursuit algorithms, prove the epsilon -optimality of the newly introduced al
gorithms, and present a quantitative comparison between them.