Index-based policies for discounted multi-armed bandits on parallel machines

Citation
Kd. Glazebrook et Dj. Wilkinson, Index-based policies for discounted multi-armed bandits on parallel machines, ANN APPL PR, 10(3), 2000, pp. 877-896
Citations number
14
Categorie Soggetti
Mathematics
Journal title
ANNALS OF APPLIED PROBABILITY
ISSN journal
10505164 → ACNP
Volume
10
Issue
3
Year of publication
2000
Pages
877 - 896
Database
ISI
SICI code
1050-5164(200008)10:3<877:IPFDMB>2.0.ZU;2-A
Abstract
We utilize and develop elements of the recent achievable region account of Gittins indexation by Bertsimas and Nino-Mora to design index-based policie s for discounted multi-armed bandits on parallel machines. The policies ana lyzed have expected rewards which come within an O(alpha) quantity of optim ality, where alpha > 0 is a discount rate. In the main, the policies make a n initial once for all allocation of bandits to machines, with each machine then handling its own workload optimally. This allocation must take carefu l account of the index structure of the bandits. The corresponding limit po licies are average-overtaking optimal.