MEAN-VARIANCE TRADEOFFS IN AN UNDISCOUNTED MDP

Authors
Citation
Mj. Sobel, MEAN-VARIANCE TRADEOFFS IN AN UNDISCOUNTED MDP, Operations research, 42(1), 1994, pp. 175-183
Citations number
27
Categorie Soggetti
Management,"Operatione Research & Management Science","Operatione Research & Management Science
Journal title
ISSN journal
0030364X
Volume
42
Issue
1
Year of publication
1994
Pages
175 - 183
Database
ISI
SICI code
0030-364X(1994)42:1<175:MTIAUM>2.0.ZU;2-S
Abstract
A stationary policy and an initial state in an MDP (Markov decision pr ocess) induce a stationary probability distribution of the reward. The problem analyzed here is generating the Pareto optima in the sense of high mean and low variance of the stationary distribution. In the uni chain case, Pareto optima can be computed either with policy improveme nt or with a linear program having the same number of variables and on e more constraint than the formulation for gain-rate optimization. The same linear program suffices in the multichain case if the ergodic cl ass is an element of choice.