S. Argamon-engelson et al., Interleaved versus a priori exploration for repeated navigation in a partially-known graph, INT J PATT, 13(7), 1999, pp. 963-986
Citations number
31
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE
In this paper, we address the tradeoff between exploration and exploitation
for agents which need to learn more about the structure of their environme
nt in order to perform more effectively. For example, a software agent oper
ating on the World Wide Web may need to learn which sites on the net are mo
st useful, and the most efficient routes to those sites. We compare explora
tion strategies for a repeated task, where the agent is given some particul
ar task to perform some number of times. Tasks are modeled as navigation on
a partially known (deterministic) graph. This paper describes a new utilit
y-based exploration algorithm for repeated tasks which interleaves explorat
ion with task performance. The method takes into account both the costs and
the potential benefits (for future task repetitions) of different explorat
ory actions. Exploration is performed in a greedy fashion, with the locally
optimal exploratory action performed during repetition of each task. We ex
perimentally evaluated our utility-based interleaved exploration algorithm
against a heuristic search algorithm for exploration before task performanc
e (a priori exploration) as well as a randomized interleaved exploration al
gorithm. We found that for a single repeated task, utility-based interleave
d exploration consistently outperforms the alternatives, unless the number
of task repetitions is very high. In addition, we extended the algorithms f
or the case of multiple repeated tasks, where the agent has a different, ra
ndomly-chosen task (from a known subset of possible tasks) to perform each
time. Here too, we found that utility-based interleaved exploration is clea
r in most cases.