ITA
ENG

MACHINE LEARNING AND NONPARAMETRIC BANDIT THEORY

Authors

LAI TL YAKOWITZ S

Citation

Tl. Lai et S. Yakowitz, MACHINE LEARNING AND NONPARAMETRIC BANDIT THEORY, IEEE transactions on automatic control, 40(7), 1995, pp. 1199-1209

Citations number

Categorie Soggetti

Controlo Theory & Cybernetics","Robotics & Automatic Control","Engineering, Eletrical & Electronic

Journal title

IEEE transactions on automatic control → ACNP

ISSN journal

00189286

Volume

Issue

Year of publication

1995

Pages

1199 - 1209

Database

ISI

SICI code

0018-9286(1995)40:7<1199:MLANBT>2.0.ZU;2-A

Abstract

In its most basic form, bandit theory is concerned with the design pro blem of sequentially choosing members from a given collection of rando m variables so that the regret, i,e,, R(n) = Sigma(j)(mu - mu(j))ET(n )(j), grows as slowly as possible with increasing n, Here mu(j) is the expected value of the bandit arm (i.e,, random variable) indexed by j , T-n, (j) is the number of times arm j has been selected in the first n decision stages, and mu = sup(j) mu(j). The present paper contribu tes to the theory by considering the situation in which observations a re dependent, To begin with, the dependency is presumed to depend only on past observations of the same arm, but later, we allow that it may be with respect to the entire past and that the set of arms is infini te, This brings queues and, more generally, controlled Markov processe s into our purview, Thus our ''black-box'' methodology is suitable for the case when the only observables are cost values and, in particular , the probability structure and loss function are unknown to the desig ner. The conclusion of the analysis is that under lenient conditions, using algorithms prescribed herein, risk growth is commensurate with t hat in the simplest i,i,d, cases. Our methods represent an alternative to recent stochastic-approximation/perturbation-analysis ideas for tu ning queues.