The multi-armed bandit problem with switching penalties (switching cos
t and switching delays) is investigated. It is shown that under an opt
imal policy, decisions about the processor allocation need to be made
only at stopping times that achieve an appropriate index, the well-kno
wn ''Gittins index'' or a ''switching index'' that is defined for swit
ching cost and switching delays, An algorithm for the computation of t
he ''switching index'' is presented. Furthermore, sufficient condition
s for optimality of allocation strategies, based on limited look-ahead
techniques, are established, These conditions together with the above
-mentioned feature of optimal scheduling policies simplify the search
for an optimal allocation policy. For a special class of multi-armed b
andits (scheduling of parallel queues with switching penalties and no
arrivals), it is shown that the aforementioned property of optimal pol
icies is sufficient to determine an optimal allocation strategy, In ge
neral, the determination of optimal allocation policies remains a diff
icult and challenging task.