ITA
ENG

MODULE-BASED REINFORCEMENT LEARNING - EXPERIMENTS WITH A REAL ROBOT

Authors

KALMAR Z SZEPESVARI C LORINCZ A

Citation

Z. Kalmar et al., MODULE-BASED REINFORCEMENT LEARNING - EXPERIMENTS WITH A REAL ROBOT, AUTONOMOUS ROBOTS, 5(3-4), 1998, pp. 273-295

Citations number

Categorie Soggetti

Robotics & Automatic Control","Computer Science Artificial Intelligence","Computer Science Artificial Intelligence","Robotics & Automatic Control

Journal title

AUTONOMOUS ROBOTS → ACNP

ISSN journal

09295593

Volume

Issue

3-4

Year of publication

1998

Pages

273 - 295

Database

ISI

SICI code

0929-5593(1998)5:3-4<273:MRL-EW>2.0.ZU;2-P

Abstract

The behavior of reinforcement learning (RL) algorithms is best underst ood in completely observable, discrete-time controlled Markov chains w ith finite state and action spaces. In contrast, robot-learning domain s are inherently continuous both in time and space, and moreover are p artially observable. Here we suggest a systematic approach to solve su ch problems in which the available qualitative and quantitative knowle dge is used to reduce the complexity of learning task. The steps of th e design process are to: (i) decompose the task into subtasks using th e qualitative knowledge at hand; (ii) design local controllers to solv e the subtasks using the available quantitative knowledge, and (iii) l earn a coordination of these controllers by means of reinforcement lea rning. It is argued that the approach enables fast, semi-automatic, bu t still high quality robot-control as no fine-tuning of the local cont rollers is needed. The approach was verified on a non-trivial real-lif e robot task. Several RL algorithms were compared by ANOVA and it was found that the model-based approach worked significantly better than t he model-free approach. The learnt switching strategy performed compar ably to a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which were not foreseen in advance, thus supporting the view that adaptive algorithms are adv antageous to nonadaptive ones in complex environments.