We study the relationship between data compression and prediction in s
ingle-layer neural networks of limited complexity. Quantifying the int
uitive notion of Occam's razor using Rissanen's minimum complexity fra
mework, we investigate the model-selection criterion advocated by this
principle. While we find that the criterion works well for large samp
le sizes (as it must for consistency), the behavior for finite sample
sizes is rather complex, depending intricately on the relationship bet
ween the complexity of the hypothesis space and the target space. We a
lso show that the limited networks studied perform efficient data comp
ression. even in the error full regime.