ITA
ENG

Dynamic update of the reinforcement function during learning

Authors

Santos, JM Touzet, C

Citation

Jm. Santos et C. Touzet, Dynamic update of the reinforcement function during learning, CONNECT SCI, 11(3-4), 1999, pp. 267-289

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

CONNECTION SCIENCE

ISSN journal

09540091 → ACNP

Volume

Issue

3-4

Year of publication

1999

Pages

267 - 289

Database

ISI

SICI code

0954-0091(199912)11:3-4<267:DUOTRF>2.0.ZU;2-8

Abstract

During the last decade, numerous contributions have been made to the use of reinforcement learning in the robot learning field. They have focused main ly on the generalization, memorization and exploration issues-mandatory for dealing with real robots. However: it is our opinion that the most difficu lt task today is to obtain the definition of the reinforcement function (RF ). A first attempt in this direction was made by introducing a method-the u pdate parameters algorithm (UPA)-for tuning a RF in such a way that it woul d be optimal during the exploration phase. The only requirement is to confo rm to a particular expression of RE in this article, we propose Dynamic-UPA , an algorithm able to tune the RF parameters during the whole learning pha se (exploration and exploitation). It allows one to undertake the so-called exploration versus exploitation dilemma through careful computation of the RF parameter values by controlling the ratio between positive and negative reinforcement during learning. Experiments with the mobile robot Khepera i n tasks of synthesis of obstacle avoidance and wall-following behaviors val idate our proposals.