ITA
ENG

WHAT SIZE TEST SET GIVES GOOD ERROR RATE ESTIMATES

Authors

GUYON I MAKHOUL J SCHWARTZ R VAPNIK V

Citation

I. Guyon et al., WHAT SIZE TEST SET GIVES GOOD ERROR RATE ESTIMATES, IEEE transactions on pattern analysis and machine intelligence, 20(1), 1998, pp. 52-64

Citations number

Categorie Soggetti

Computer Science Artificial Intelligence","Computer Science Artificial Intelligence","Engineering, Eletrical & Electronic

Journal title

IEEE transactions on pattern analysis and machine intelligence → ACNP

ISSN journal

01628828

Volume

Issue

Year of publication

1998

Pages

52 - 64

Database

ISI

SICI code

0162-8828(1998)20:1<52:WSTSGG>2.0.ZU;2-T

Abstract

We address the problem of determining what size test set guarantees st atistically significant results in a character recognition task, as a function of the expected error rate. We provide a statistical analysis showing that if, for example, the expected character error rate is ar ound 1 percent, then, with a test set of at least 10,000 statistically independent handwritten characters (which could be obtained by taking 100 characters from each of 100 different writers), we guarantee, wit h 95 percent confidence, that: (1) The expected value of the character error rate is not worse than 1.25 E, where Eis the empirical characte r error rate of the best recognizer, calculated on the test set; and ( 2) a difference of 0.3 E between the error rates of two recognizers is significant. We developed this framework with character recognition a pplications in mind, but it applies as well to speech recognition and to other pattern recognition problems.