The learning architecture described in this article autonomously acquires a
topographical (metric) map that encodes a measure of "value" for xy-Cartes
ian locations in an environment. There are two reasons for the creation of
low value areas. Direct negative reinforcement from the environment will re
sult from the robot discovering obstacles or having other "unpleasant" expe
riences. The other source of negative reinforcement is internally generated
by the learning algorithm, as it identifies regions that are a long distan
ce away from the "pleasant" places in the environment. Conversely example "
pleasant" places, where positive environmental reward is received, might be
energy-charging sites or simply locations that the robot should visit in e
xecuting its daily tasks. In general what the robot learns is a map of "mot
ivational" tendencies, or "expectancies". In such a map, the value attached
to a place comes to reflect a balance between the good and bad rewards att
ainable from that position. When the Temporal Difference learning part of t
he architecture is turned on, that measure of value comes to include an est
imate of how far, in travel time, it is to positive reinforcement. The arch
itecture is loosely based on an Adaptive Heuristic Critic structure. Explor
ation of a continuous-valued search space is conducted by an Evolution Stra
tegy, tuned for fast and approximate optimization. Knowledge acquired auton
omously from this exploration is stored in a Radial Basis Function (RBF) ne
ural network. Inherent features of this neural network type lead to the cre
ation of a "potential field" structure that exerts appetitive and aversive
"forces" on the robot as it moves around in the environment. The results of
simulation experiments are presented, with a view to illustrating the stre
ngths and weaknesses of the architecture. The map building architecture pro
posed here is intended to form part of an overall navigational system. In f
uture work it will be integrated with a self-localization algorithm, landma
rk-based topological mapping, and a reactive system for dealing with local
dynamics in the environment.