In this article we prove sanity-check bounds for the error of the leave-one
-out cross-validation estimate of the generalization error: that is, bounds
showing that the worst-case error of this estimate is not much worse than
that of the training error estimate. The name sanity check refers to the fa
ct that although we often expect the leave-one-out estimate to perform cons
iderably better than the training error estimate, we are here only seeking
assurance that its performance will not be considerably worse. Perhaps surp
risingly, such assurance has been given only for limited cases in the prior
literature on cross-validation.
Any nontrivial bound on the error of leave-one-out must rely on some notion
of algorithmic stability. Previous bounds relied on the rather strong noti
on of hypothesis stability, whose application was primarily limited to near
est-neighbor and other local algorithms. Here we introduce the new and weak
er notion of error stability and apply it to obtain sanity-check bounds for
leave-one-out for other classes of learning algorithms, including training
error minimization procedures and Bayesian algorithms. We also provide low
er bounds demonstrating the necessity of some form of error stability for p
roving bounds on the error of the leave-one-out estimate, and the fact that
for training error minimization algorithms, in the worst case such bounds
must still depend on the Vapnik-Chervonenkis dimension of the hypothesis cl
ass.