J. Morimoto et K. Doya, Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning, ROBOT AUT S, 36(1), 2001, pp. 37-51
In this paper, we propose a hierarchical reinforcement learning architectur
e that realizes practical learning speed in real hardware control tasks. In
order to enable learning in a practical number of trials, we introduce a l
ow-dimensional representation of the state of the robot for higher-level pl
anning. The upper level learns a discrete sequence of sub-goals in a low-di
mensional state space for achieving the main goal of the task. The lower-le
vel modules learn local trajectories in the original high-dimensional state
space to achieve the sub-goal specified by the upper level.
We applied the hierarchical architecture to a three-link, two-joint robot f
or the task of learning to stand up by trial and error. The upper-level lea
rning was implemented by Q-learning, while the lower-level learning was imp
lemented by a continuous actor-critic method. The robot successfully learne
d to stand up within 750 trials in simulation and then in an additional 170
trials using real hardware. The effects of the setting of the search steps
in the upper level and the use of a supplementary reward for achieving sub
-goals are also tested in simulation. (C) 2001 Elsevier Science B.V. All ri
ghts reserved.