Let the kth-order empirical distribution of a code be defined as the p
roportion of k-strings anywhere in the codebook equal to every given k
-string. We show that for any fixed k, the X th-order empirical distri
bution of any good code (i.e., a code approaching capacity with vanish
ing probability of error) converges in the sense of divergence to the
set of input distributions that maximize the input/output mutual infor
mation of X channel uses, This statement is proved for discrete memory
less channels as well as a large class of channels with memory, If k g
rows logarithmically (or faster) with blocklength, the result no longe
r holds for certain good codes, whereas for other good codes, the resu
lt can be shown for k growing as fast as a certain fraction of blockle
ngth.