DYNAMIC NON-BAYESIAN DECISION-MAKING

Citation
D. Monderer et M. Tennenholtz, DYNAMIC NON-BAYESIAN DECISION-MAKING, The journal of artificial intelligence research, 7, 1997, pp. 231-248
Citations number
32
ISSN journal
10769757
Volume
7
Year of publication
1997
Pages
231 - 248
Database
ISI
SICI code
1076-9757(1997)7:<231:DND>2.0.ZU;2-X
Abstract
The model of a non-Bayesian agent who faces a repeated game with incom plete information against Nature is an appropriate tool for modeling g eneral agent-environment interactions. In such a model the environment state (controlled by Nature) may change arbitrarily, and the feedback /reward function is initially unknown. The agent is not Bayesian, that is he does not form a prior probability neither on the state selectio n strategy of Nature, nor on his reward function. A policy for the age nt is a function which assigns an action to every history of observati ons and actions. Two basic feedback structures are considered. In one of them - the perfect monitoring case - the agent is able to observe t he previous environment state as part of his feedback, while in the ot her - the imperfect monitoring case - all that is available to the age nt is the reward obtained. Both of these settings refer to partially o bservable processes, where the current environment state is unknown. O ur main result refers to the competitive ratio criterion in the perfec t monitoring case. We prove the existence of an efficient stochastic p olicy that ensures that the competitive ratio is obtained at almost al l stages with an arbitrarily high probability, where efficiency is mea sured in terms of rate of convergence. It is further shown that such a n optimal policy does not exist in the imperfect monitoring case. More over, it is proved that in the perfect monitoring case there does not exist a deterministic policy that satisfies our long run optimality cr iterion. In addition, we discuss the maxmin criterion and prove that a deterministic efficient optimal strategy does exist in the imperfect monitoring case under this criterion. Finally we show that our approac h to long-run optimality can be viewed as qualitative, which distingui shes it from previous work in this area.