A teacher perceptron T with N binary components provides the classific
ation of a set of p randomly chosen training examples. Several algorit
hms are available that use this information to select a student percep
tron J with continuous components J(i). The purpose is to maximize the
overlap R = T . J/(/T//J/), or to minimize the corresponding generali
zation error epsilon = (1/pi)arccos R. In view of the binary nature of
the components of the teacher, one might expect that a lower error ca
n be achieved by working with the clipped version of the student vecto
r, namely the vector with components sign (J(i)). It turns out that th
is is not always the case. In this letter we calculate the overlap ($)
over tilde R for a vector with components f(J(i)), where f can be any
odd function of its argument, as a function of the overlap R. We show
that the optimal choice of f is a hyperbolic tangent f(x) = th ((R/(1
- R(2))x)). The corresponding generalization error can go to zero exp
onentially fast in a(2), for a large (a = p/N).