This paper shows how the choice of representation substantially affects the
generalization performance of connectionist networks. The starting point i
s Chalmers' simulations involving structure-sensitive processing. Chalmers
argued that a connectionist network could handle structure sensitive proces
sing without the use of syntactically structured representations. He traine
d a connectionist architecture to encode/decode distributed representations
for simple sentences. These distributed representations were then holistic
ally transformed such that active sentences were transformed into their pas
sive counterpart. However, he noted that the recursive auto-associative mem
ory (RAAM), which was used to encode and decode distributed representations
for the structures, exhibited only a limited ability to generalize when tr
ained to encode/decode a randomly selected sample of the total corpus. When
the RAAM was trained to encode/decode all sentences, and a separate transf
ormation network was trained to make some active-passive transformations of
the RAAM-encoded sentences, the transformation network demonstrated perfec
t generalization on the remaining test sentences. It is argued here that th
e main reason for the limited generalization is not the ability of the RAAM
architecture per se, but the choice of representation for the tokens used.
This paper shows that 100% generalization can be achieved for Chalmers' or
iginal set up (i.e. using only 30% of the total corpus for training). The k
ey to this success is to use distributed representations for the tokens (ca
pturing different characteristics for different classes of tokens, e.g. ver
bs or nouns).