ITA
ENG

MULTIARMED BANDITS WITH SWITCHING PENALTIES

Authors

ASAWA M TENEKETZIS D

Citation

M. Asawa et D. Teneketzis, MULTIARMED BANDITS WITH SWITCHING PENALTIES, IEEE transactions on automatic control, 41(3), 1996, pp. 328-348

Citations number

Categorie Soggetti

Controlo Theory & Cybernetics","Robotics & Automatic Control","Engineering, Eletrical & Electronic

Journal title

IEEE transactions on automatic control → ACNP

ISSN journal

00189286

Volume

Issue

Year of publication

1996

Pages

328 - 348

Database

ISI

SICI code

0018-9286(1996)41:3<328:MBWSP>2.0.ZU;2-7

Abstract

The multi-armed bandit problem with switching penalties (switching cos t and switching delays) is investigated. It is shown that under an opt imal policy, decisions about the processor allocation need to be made only at stopping times that achieve an appropriate index, the well-kno wn ''Gittins index'' or a ''switching index'' that is defined for swit ching cost and switching delays, An algorithm for the computation of t he ''switching index'' is presented. Furthermore, sufficient condition s for optimality of allocation strategies, based on limited look-ahead techniques, are established, These conditions together with the above -mentioned feature of optimal scheduling policies simplify the search for an optimal allocation policy. For a special class of multi-armed b andits (scheduling of parallel queues with switching penalties and no arrivals), it is shown that the aforementioned property of optimal pol icies is sufficient to determine an optimal allocation strategy, In ge neral, the determination of optimal allocation policies remains a diff icult and challenging task.