HQ-learning is a hierarchical extension of Q(lambda)-learning designed
to solve certain types of partially observable Markov decision proble
ms (POMDPs). HQ automatically decomposes POMDPs into sequences of simp
ler subtasks that can be solved by memoryless policies learnable by re
active subagents. HQ can solve partially observable mazes with more st
ates than those used in moss previous POMDP work.