Ei. Gordienko et Ja. Minjarezsosa, ADAPTIVE-CONTROL FOR DISCRETE-TIME MARKOV-PROCESSES WITH UNBOUNDED COSTS - DISCOUNTED CRITERION, Kybernetika, 34(2), 1998, pp. 217-234
We study the adaptive control problem for discrete-time Markov control
processes with Borel state and action spaces and possibly unbounded o
ne-stage costs. The processes are given by recurrent equations x(t+1)
= F(x(t), a(t), xi(t)), t = 0,1,... with i.i.d. R-k-valued random vect
ors xi(t) whose density rho is unknown. Assuming observability of xi(t
) we propose the procedure of statistical estimation of rho that allow
s us to prove discounted asymptotic optimality of two types of adaptiv
e policies used early for the processes with bounded costs.