Following Tesauro's work on TD-Gammon, we used a 4,000 parameter Feedf
orward neural network to develop a competitive backgammon evaluation f
unction. Play proceeds by a roll of the dice, application of the netwo
rk to all legal moves, and selection of the position with the highest
evaluation. However, no backpropagation, reinforcement or temporal dif
ference learning methods were employed. Instead we apply simple hillcl
imbing in a relative fitness environment. We start with an initial cha
mpion of all zero weights and proceed simply by playing the current ch
ampion network against a slightly mutated challenger and changing weig
hts if the challenger wins. Surprisingly, this worked rather well. We
investigate how the peculiar dynamics of this domain enabled a previou
sly discarded weak method to succeed, by preventing suboptimal equilib
ria in a ''meta-game'' of self-learning.