ITA
ENG

MEAN-VARIANCE TRADEOFFS IN AN UNDISCOUNTED MDP

Authors

SOBEL MJ

Citation

Mj. Sobel, MEAN-VARIANCE TRADEOFFS IN AN UNDISCOUNTED MDP, Operations research, 42(1), 1994, pp. 175-183

Citations number

Categorie Soggetti

Management,"Operatione Research & Management Science","Operatione Research & Management Science

Journal title

Operations research → ACNP

ISSN journal

0030364X

Volume

Issue

Year of publication

1994

Pages

175 - 183

Database

ISI

SICI code

0030-364X(1994)42:1<175:MTIAUM>2.0.ZU;2-S

Abstract

A stationary policy and an initial state in an MDP (Markov decision pr ocess) induce a stationary probability distribution of the reward. The problem analyzed here is generating the Pareto optima in the sense of high mean and low variance of the stationary distribution. In the uni chain case, Pareto optima can be computed either with policy improveme nt or with a linear program having the same number of variables and on e more constraint than the formulation for gain-rate optimization. The same linear program suffices in the multichain case if the ergodic cl ass is an element of choice.