I. Guyon et al., WHAT SIZE TEST SET GIVES GOOD ERROR RATE ESTIMATES, IEEE transactions on pattern analysis and machine intelligence, 20(1), 1998, pp. 52-64
We address the problem of determining what size test set guarantees st
atistically significant results in a character recognition task, as a
function of the expected error rate. We provide a statistical analysis
showing that if, for example, the expected character error rate is ar
ound 1 percent, then, with a test set of at least 10,000 statistically
independent handwritten characters (which could be obtained by taking
100 characters from each of 100 different writers), we guarantee, wit
h 95 percent confidence, that: (1) The expected value of the character
error rate is not worse than 1.25 E, where Eis the empirical characte
r error rate of the best recognizer, calculated on the test set; and (
2) a difference of 0.3 E between the error rates of two recognizers is
significant. We developed this framework with character recognition a
pplications in mind, but it applies as well to speech recognition and
to other pattern recognition problems.