Single sample path-based optimization of Markov chains

Authors
Citation
Xr. Cao, Single sample path-based optimization of Markov chains, J OPTIM TH, 100(3), 1999, pp. 527-548
Citations number
18
Categorie Soggetti
Engineering Mathematics
Journal title
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS
ISSN journal
00223239 → ACNP
Volume
100
Issue
3
Year of publication
1999
Pages
527 - 548
Database
ISI
SICI code
0022-3239(199903)100:3<527:SSPOOM>2.0.ZU;2-I
Abstract
Motivated by the needs of on-line optimization of real-world engineering sy stems, we studied single sample path-based algorithms for Markov decision p roblems (MDP). The sample path used in the algorithms can be obtained by ob serving the operation of a real system. We give a simple example to explain the advantages of the sample path-based approach over the traditional comp utation-based approach: matrix inversion is not required; some transition p robabilities do not have to be known; it may save storage space; and it giv es the flexibility of iterating the actions for a subset of the state space in each iteration. The effect of the estimation errors and the convergence property of the sample path-based approach are studied. Finally, we propos e a fast algorithm, which updates the policy whenever the system reaches a particular set of states and prove that the algorithm converges to the true optimal policy with probability one under some conditions. The sample path -based approach may have important applications to the design and managemen t of engineering systems, such as high speed communication networks.