A multichain Markov decision process with constraints on the expected
state-action frequencies may lead to a unique optimal policy which doe
s not satisfy Bellman's principle of optimality. The model with sample
-path constraints does not suffer from this drawback.