Learning to play chess using temporal differences

Citation
J. Baxter et al., Learning to play chess using temporal differences, MACH LEARN, 40(3), 2000, pp. 243-263
Citations number
16
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
MACHINE LEARNING
ISSN journal
08856125 → ACNP
Volume
40
Issue
3
Year of publication
2000
Pages
243 - 263
Database
ISI
SICI code
0885-6125(200009)40:3<243:LTPCUT>2.0.ZU;2-B
Abstract
In this paper we present TDLEAF(lambda), a variation on the TD(lambda) algo rithm that enables it to be used in conjunction with game-tree search. We p resent some experiments in which our chess program "KnightCap" used TDLEAF( lambda) to learn its evaluation function while playing on Internet chess se rvers. The main success we report is that KnightCap improved from a 1650 ra ting to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-lin e, rather than self-play. We also investigate whether TDLEAF(lambda) can yi eld better results in the domain of backgammon, where TD(lambda) has previo usly yielded striking success.