In this paper we consider the adaptive control of constrained finite ergodi
c controller Markov chains whose transition probabilities are unknown. The
control policy is designed to achieve the minimization of a loss function u
nder a set of inequality constraints. The average values of conditional mat
hematical expectations of this loss function and constraints are also assum
ed to be unknown. A regularized penalty function is introduced to derive an
adaptive control algorithm. In this algorithm the transition probabilities
of the Markov chain and the average values of the constraints are estimate
d at each time n. The control policy is adjusted using the Bush-Mosteller r
einforcement scheme as a stochastic approximation procedure. Its asymptotic
properties are stated. We establish that the optimal convergence rate is e
qual to n(-1/3+delta) (delta is any small positive parameter), (C) 1998 Joh
n Wiley & Sons, Ltd.