We investigate layered neural networks with differentiable activation funct
ion and student vectors without normalization constraint by means of equili
brium statistical physics. We consider the learning of perfectly realizable
rules and find that the length of student vectors becomes infinite, unless
a proper weight decay term is added to the energy. Then, the system underg
oes a first-order phase transition between states with very long student ve
ctors and states where the lengths are comparable to those of the teacher v
ectors. Additionally, in both configurations there is a phase transition be
tween a specialized and an unspecialized phase. An anti-specialized phase w
ith long student vectors exists in networks with a small number of hidden u
nits.