ITA
ENG

ADAPTIVE LEARNING-MECHANISMS FOR ORDERING ACTIONS USING RANDOM RACES

Authors

NG DTH OOMMEN BJ HANSEN ER

Citation

Dth. Ng et al., ADAPTIVE LEARNING-MECHANISMS FOR ORDERING ACTIONS USING RANDOM RACES, IEEE transactions on systems, man, and cybernetics, 23(5), 1993, pp. 1450-1465

Citations number

Categorie Soggetti

Controlo Theory & Cybernetics","Computer Science Cybernetics","Engineering, Eletrical & Electronic

Journal title

IEEE transactions on systems, man, and cybernetics → ACNP

ISSN journal

00189472

Volume

Issue

Year of publication

1993

Pages

1450 - 1465

Database

ISI

SICI code

0018-9472(1993)23:5<1450:ALFOAU>2.0.ZU;2-1

Abstract

Consider a learning machine (LM) interacting with an environment epsil on. The environment offers the machine M actions. Traditionally, learn ing systems endeavor to compute the best action that the environment o ffers, and this is done without any estimation procedure. In this pape r, we consider the problem of the LM computing not only the optimal ac tion offered but also the ordering of the actions in terms of their op timality. The problem is posed in its generality and various norms of learning in this setting are formalized. Also various learning strateg ies are presented that use a new mathematical model called the random race. In this model the learning is modeled using M racers that are ru nning toward a goal. At each instant, racer R(i) moves toward the goal with a probability of s(i) and stays where he is with a probability o f (1 - s(i)). In the simplest learning model, the learning multiple ra ce track (LMRT) model, the racers run on multiple tracks, and in this scenario, each racer has his own track, thus disallowing interference between the racers. However, in a more general setting, the learning s ingle race track (LSRT) model, the racers run on a single track, and i n this case, interferences between racers are specified in terms of ov ertaking rules. In this paper, we first examine the learning multiple race track (LMRT) model, and we have shown that in the absence of a pr iori information the LMRT is permutationally epsilon-optimal in all su ggestive random environments. Subsequently, a variety of results are p roven for the cases when the LMRT uses the a priori information by giv ing the racers handicaps that are either uniformly or geometrically di stributed. Analogous results for the learning single race track (LSRT) model are also conjectured and these conjectures are strengthened by numerous simulations.