OPTIMALITY OF MONOTONIC POLICIES FOR 2-ACTION MARKOVIAN DECISION-PROCESSES, WITH APPLICATIONS TO CONTROL OF QUEUES WITH DELAYED INFORMATION

Citation
E. Altman et S. Stidham, OPTIMALITY OF MONOTONIC POLICIES FOR 2-ACTION MARKOVIAN DECISION-PROCESSES, WITH APPLICATIONS TO CONTROL OF QUEUES WITH DELAYED INFORMATION, Queuing systems, 21(3-4), 1995, pp. 267-291
Citations number
21
Categorie Soggetti
Operatione Research & Management Science","Computer Science Interdisciplinary Applications
Journal title
ISSN journal
02570130
Volume
21
Issue
3-4
Year of publication
1995
Pages
267 - 291
Database
ISI
SICI code
0257-0130(1995)21:3-4<267:OOMPF2>2.0.ZU;2-F
Abstract
We consider a discrete-time Markov decision process with a partially o rdered state space and two feasible control actions in each state. Our goal is to find general conditions, which are satisfied in a broad cl ass of applications to control of queues, under which an optimal contr ol policy is monotonic. An advantage of our approach is that it easily extends to problems with both information and action delays, which ar e common in applications to high-speed communication networks, among o thers. The transition probabilities are stochastically monotone and th e one-stage reward submodular. We further assume that transitions from different states are coupled, in the sense that the state after a tra nsition is distributed as a deterministic function of the current stat e and two random variables, one of which is controllable and the other uncontrollable. Finally, we make a monotonicity assumption about the sample-path effect of a pairwise switch of the actions in consecutive stages. Using induction on the horizon length, we demonstrate that opt imal policies for the finite- and infinite-horizon discounted problems are monotonic. We apply these results to a single queueing facility w ith control of arrivals and/or services, under very general conditions . In this case, our results imply that an optimal control policy has t hreshold form. Finally, we show how monotonicity of an optimal policy extends in a natural way to problems with information and/or action de lay, including delays of more than one time unit. Specifically, we sho w that, if a problem without delay satisfies our sufficient conditions for monotonicity of an optimal policy, then the same problem with inf ormation and/or action delay also has monotonic (e.g., threshold) opti mal policies.