Continuous and discretized pursuit learning schemes: Various algorithms and their comparison

Citation
Bj. Oommen et M. Agache, Continuous and discretized pursuit learning schemes: Various algorithms and their comparison, IEEE SYST B, 31(3), 2001, pp. 277-287
Citations number
26
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS
ISSN journal
10834419 → ACNP
Volume
31
Issue
3
Year of publication
2001
Pages
277 - 287
Database
ISI
SICI code
1083-4419(200106)31:3<277:CADPLS>2.0.ZU;2-E
Abstract
A learning automaton (LA) is an automaton that interacts with a random envi ronment, having as its goal the task of learning the optimal action based o n its acquired experience. Many learning automats (LAs) have been proposed, with the class of estimator algorithms being among the fastest ones, Thath achar and Sastry, through the pursuit algorithm, introduced the concept of learning algorithms that pursue the current optimal action, following a rew ard-penalty learning philosophy, Later, Oommen and Lanctot extended the pur suit algorithm into the discretized world by presenting the discretized pur suit algorithm, based on a reward-inaction learning philosophy. In this pap er we argue that the reward-penalty and reward-inaction learning paradigms in conjunction with the continuous and discrete models of computation, lead to four versions of pursuit learning automata, We contend that a scheme th at merges the pursuit concept with the most recent response of the environm ent, permits the algorithm to utilize the LAs long-term and short-term pers pectives of the environment. In this paper, we present all four resultant p ursuit algorithms, prove the epsilon -optimality of the newly introduced al gorithms, and present a quantitative comparison between them.