A statistical theory for overtraining is proposed. The analysis treats
general realizable stochastic neural networks, trained with Kullback-
Leibler divergence in the asymptotic case of a large number of trainin
g examples. It is shown that the asymptotic gain in the generalization
error Is small if we perform early stopping, even if we have access t
o the optimal stopping time. Considering cross-validation stopping we
answer the question: In what ratio the examples should be divided into
training and cross-validation sets in order to obtain the optimum per
formance. Although cross-validated early stopping is useless in the as
ymptotic region, it surely decreases the generalization error in the n
onasymptotic region. Our large scale simulations done on a CM5 are in
nice agreement with our analytical findings.