Consider the following situation, A device called a classifier observe
s a probability law P on l-vectors from an alphabet of size A, Its tas
k is to observe a second probability law Q and decide whether P = Q or
P and Q are sufficiently different according to some appropriate crit
erion, If the classifier has available an unlimited memory (so that it
can remember P(z) exactly for all z), this is a simple matter, In fac
t for most differentness criteria, a finite memory of 2((log A)l+o(l))
bits will suffice (for large l), i.e., store a finite approximation o
f P(z) for all A(l)z's. In a sense made precise in this paper, it is s
hown that a memory of only about 2(Rl) bits is required, where the qua
ntity R < log A, and is closely related to the entropy of P. Further,
it is shown that if instead of being given P(z), for all z, the classi
fier is given a training sequence drawn with probability law P that ca
n be stored using about 2(Rl) bits, then correct classification is als
o possible.