A common approach to evaluating competing models in a classification contex
t is via accuracy on a test set or on cross-validation sets. However, this
can be computationally costly when using genetic algorithms with large data
sets and the benefits of performing a wide search are compromised by the fa
ct that estimates of the generalization abilities of competing models are s
ubject to noise. This paper shows that clear advantages can be gained by us
ing samples of the test set when evaluating competing models. Further, that
applying statistical tests in combination with Occam's razor produces pars
imonious models, matches the level of evaluation to the state of the search
and retains the speed advantages of test set sampling.