Quad-Q-learning

Citation
C. Clausen et H. Wechsler, Quad-Q-learning, IEEE NEURAL, 11(2), 2000, pp. 279-294
Citations number
12
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
IEEE TRANSACTIONS ON NEURAL NETWORKS
ISSN journal
10459227 → ACNP
Volume
11
Issue
2
Year of publication
2000
Pages
279 - 294
Database
ISI
SICI code
1045-9227(200003)11:2<279:Q>2.0.ZU;2-E
Abstract
This paper develops the theory of quad-Q-learning which is a new learning a lgorithm that evolved from Q-learning, Quad-Q-learning is applicable to pro blems that can be solved by "divide and conquer" techniques. Quad-&-learnin g concerns an autonomous agent that learns without supervision to act optim ally to achieve specified goals. The learning agent acts in an environment that can be characterized by a state. Although Q-learning and quad-Q-learni ng are in many respects similar, they differ in their notion of state trans itions. In the Q-learning environment, when an action is taken, a reward is received and a single new state results. The objective of Q-learning is to learn a policy function that maps states to actions so as to maximize a fu nction of the rewards such as the sum of rewards, However, with respect to quad-Q-learning, when an action is taken from a state either an immediate r eward and no new state results, or no reward is received and four new state s result from taking that action. If four new states result, then each new state is treated independently as if four new environments resulted. If no new state results, no further action is taken in the associated environment . The environment in which quad-Q-learning operates can thus be viewed as a hierarchy of states where lower level states are the children of higher le vel states, The hierarchical aspect of quad-Q-learning leads to a bottom up view of learning that improves the efficiency of learning at higher levels in the hierarchy. The objective of quad-Q-learning is to maximize the sum of rewards obtained from each of the environments that result as actions ar e taken. Quad-Q-learning can be readily generalized to a family of n-Q-lear ning algorithms which are applicable to learning how to partition large int ractable problem domains into smaller problems that can be solved independe ntly. Tao versions of quad-Q-learning are discussed; these are discrete sta te and mixed discrete and continuous state quad-Q-learning, The discrete st ate version is only applicable to problems with small numbers of states. Th e problem with the discrete case is that learning for one state does not im prove learning with respect to other nearby states nor does it generalize t o previously unseen states. Scaling up to problems with practical numbers o f states requires a continuous state learning method. Continuous state lear ning can be accomplished using functional approximation methods. Applicatio n of quad-Q-learning to image compression is briefly described.