Interleaved versus a priori exploration for repeated navigation in a partially-known graph

Citation
S. Argamon-engelson et al., Interleaved versus a priori exploration for repeated navigation in a partially-known graph, INT J PATT, 13(7), 1999, pp. 963-986
Citations number
31
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE
ISSN journal
02180014 → ACNP
Volume
13
Issue
7
Year of publication
1999
Pages
963 - 986
Database
ISI
SICI code
0218-0014(199911)13:7<963:IVAPEF>2.0.ZU;2-V
Abstract
In this paper, we address the tradeoff between exploration and exploitation for agents which need to learn more about the structure of their environme nt in order to perform more effectively. For example, a software agent oper ating on the World Wide Web may need to learn which sites on the net are mo st useful, and the most efficient routes to those sites. We compare explora tion strategies for a repeated task, where the agent is given some particul ar task to perform some number of times. Tasks are modeled as navigation on a partially known (deterministic) graph. This paper describes a new utilit y-based exploration algorithm for repeated tasks which interleaves explorat ion with task performance. The method takes into account both the costs and the potential benefits (for future task repetitions) of different explorat ory actions. Exploration is performed in a greedy fashion, with the locally optimal exploratory action performed during repetition of each task. We ex perimentally evaluated our utility-based interleaved exploration algorithm against a heuristic search algorithm for exploration before task performanc e (a priori exploration) as well as a randomized interleaved exploration al gorithm. We found that for a single repeated task, utility-based interleave d exploration consistently outperforms the alternatives, unless the number of task repetitions is very high. In addition, we extended the algorithms f or the case of multiple repeated tasks, where the agent has a different, ra ndomly-chosen task (from a known subset of possible tasks) to perform each time. Here too, we found that utility-based interleaved exploration is clea r in most cases.