K. Wakuta, VECTOR-VALUED MARKOV DECISION-PROCESSES AND THE SYSTEMS OF LINEAR INEQUALITIES, Stochastic processes and their applications, 56(1), 1995, pp. 159-169
For a vector-valued Markov decision process, we characterize optimal (
deterministic) stationary policies by systems of linear inequalities a
nd present an algorithm for finding all optimal stationary policies fr
om among all randomized, history-remembering ones. The algorithm consi
sts of improving the policies and of checking the optimality of a poli
cy by solving the associated system of linear inequalities via Fourier
elimination.