In this paper, a formal test on prediction errors is developed for the
cross-validation of regression models under the simple random splitti
ng framework. Analytic as well as simulation results relate the statis
tical power of the test to the allocation of sample observations to es
timation and validation subsets. The results indicate that splitting t
he data into halves is suboptimal. More observations should be used fo
r estimation than validation. Furthermore, the proportion of the sampl
e optimally devoted to validation is small for very limited samples (N
< 20), increases to about 40% for medium-sized samples and decreases
again for large samples (N > 60). However, although the 50/50 split is
suboptimal, it is not tremendously so in a wide variety of circumstan
ces.