D. Cohn et al., THEORY AND PRACTICE OF VECTOR QUANTIZERS TRAINED ON SMALL TRAINING SETS, IEEE transactions on pattern analysis and machine intelligence, 16(1), 1994, pp. 54-65
We examine how the performance of a memoryless vector quantizer change
s as a function of its training set size. Specifically, we study how w
ell the training set distortion predicts test distortion when the trai
ning set is a randomly drawn subset of blocks from the test or trainin
g image(s). Using the Vapnik-Chervonenkis (VC) dimension, we derive fo
rmal bounds for the difference of test and training distortion of vect
or quantizer codebooks. We then describe extensive empirical simulatio
ns that test these bounds for a variety of codebook sizes and vector d
imensions, and give practical suggestions for determining the training
set size necessary to achieve good generalization from a codebook. We
conclude that, by using training sets comprised of only a small fract
ion of the available data, one can produce results that are close to t
he results obtainable when all available data are used.