A stationary policy and an initial state in an MDP (Markov decision pr
ocess) induce a stationary probability distribution of the reward. The
problem analyzed here is generating the Pareto optima in the sense of
high mean and low variance of the stationary distribution. In the uni
chain case, Pareto optima can be computed either with policy improveme
nt or with a linear program having the same number of variables and on
e more constraint than the formulation for gain-rate optimization. The
same linear program suffices in the multichain case if the ergodic cl
ass is an element of choice.