ITA
ENG

Single sample path-based optimization of Markov chains

Authors

Cao, XR

Citation

Xr. Cao, Single sample path-based optimization of Markov chains, J OPTIM TH, 100(3), 1999, pp. 527-548

Citations number

Categorie Soggetti

Engineering Mathematics

Journal title

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS

ISSN journal

00223239 → ACNP

Volume

100

Issue

Year of publication

1999

Pages

527 - 548

Database

ISI

SICI code

0022-3239(199903)100:3<527:SSPOOM>2.0.ZU;2-I

Abstract

Motivated by the needs of on-line optimization of real-world engineering sy stems, we studied single sample path-based algorithms for Markov decision p roblems (MDP). The sample path used in the algorithms can be obtained by ob serving the operation of a real system. We give a simple example to explain the advantages of the sample path-based approach over the traditional comp utation-based approach: matrix inversion is not required; some transition p robabilities do not have to be known; it may save storage space; and it giv es the flexibility of iterating the actions for a subset of the state space in each iteration. The effect of the estimation errors and the convergence property of the sample path-based approach are studied. Finally, we propos e a fast algorithm, which updates the policy whenever the system reaches a particular set of states and prove that the algorithm converges to the true optimal policy with probability one under some conditions. The sample path -based approach may have important applications to the design and managemen t of engineering systems, such as high speed communication networks.