E. Altman et S. Stidham, OPTIMALITY OF MONOTONIC POLICIES FOR 2-ACTION MARKOVIAN DECISION-PROCESSES, WITH APPLICATIONS TO CONTROL OF QUEUES WITH DELAYED INFORMATION, Queuing systems, 21(3-4), 1995, pp. 267-291
Citations number
21
Categorie Soggetti
Operatione Research & Management Science","Computer Science Interdisciplinary Applications
We consider a discrete-time Markov decision process with a partially o
rdered state space and two feasible control actions in each state. Our
goal is to find general conditions, which are satisfied in a broad cl
ass of applications to control of queues, under which an optimal contr
ol policy is monotonic. An advantage of our approach is that it easily
extends to problems with both information and action delays, which ar
e common in applications to high-speed communication networks, among o
thers. The transition probabilities are stochastically monotone and th
e one-stage reward submodular. We further assume that transitions from
different states are coupled, in the sense that the state after a tra
nsition is distributed as a deterministic function of the current stat
e and two random variables, one of which is controllable and the other
uncontrollable. Finally, we make a monotonicity assumption about the
sample-path effect of a pairwise switch of the actions in consecutive
stages. Using induction on the horizon length, we demonstrate that opt
imal policies for the finite- and infinite-horizon discounted problems
are monotonic. We apply these results to a single queueing facility w
ith control of arrivals and/or services, under very general conditions
. In this case, our results imply that an optimal control policy has t
hreshold form. Finally, we show how monotonicity of an optimal policy
extends in a natural way to problems with information and/or action de
lay, including delays of more than one time unit. Specifically, we sho
w that, if a problem without delay satisfies our sufficient conditions
for monotonicity of an optimal policy, then the same problem with inf
ormation and/or action delay also has monotonic (e.g., threshold) opti
mal policies.