ADAPTIVE LEARNING-MECHANISMS FOR ORDERING ACTIONS USING RANDOM RACES

Citation
Dth. Ng et al., ADAPTIVE LEARNING-MECHANISMS FOR ORDERING ACTIONS USING RANDOM RACES, IEEE transactions on systems, man, and cybernetics, 23(5), 1993, pp. 1450-1465
Citations number
23
Categorie Soggetti
Controlo Theory & Cybernetics","Computer Science Cybernetics","Engineering, Eletrical & Electronic
ISSN journal
00189472
Volume
23
Issue
5
Year of publication
1993
Pages
1450 - 1465
Database
ISI
SICI code
0018-9472(1993)23:5<1450:ALFOAU>2.0.ZU;2-1
Abstract
Consider a learning machine (LM) interacting with an environment epsil on. The environment offers the machine M actions. Traditionally, learn ing systems endeavor to compute the best action that the environment o ffers, and this is done without any estimation procedure. In this pape r, we consider the problem of the LM computing not only the optimal ac tion offered but also the ordering of the actions in terms of their op timality. The problem is posed in its generality and various norms of learning in this setting are formalized. Also various learning strateg ies are presented that use a new mathematical model called the random race. In this model the learning is modeled using M racers that are ru nning toward a goal. At each instant, racer R(i) moves toward the goal with a probability of s(i) and stays where he is with a probability o f (1 - s(i)). In the simplest learning model, the learning multiple ra ce track (LMRT) model, the racers run on multiple tracks, and in this scenario, each racer has his own track, thus disallowing interference between the racers. However, in a more general setting, the learning s ingle race track (LSRT) model, the racers run on a single track, and i n this case, interferences between racers are specified in terms of ov ertaking rules. In this paper, we first examine the learning multiple race track (LMRT) model, and we have shown that in the absence of a pr iori information the LMRT is permutationally epsilon-optimal in all su ggestive random environments. Subsequently, a variety of results are p roven for the cases when the LMRT uses the a priori information by giv ing the racers handicaps that are either uniformly or geometrically di stributed. Analogous results for the learning single race track (LSRT) model are also conjectured and these conjectures are strengthened by numerous simulations.