In this paper, the role of pattern matching information theory is moti
vated and discussed. We describe the relationship between a pattern's
recurrence time and its probability under the data-generating stochast
ic source. We show how this relationship has led to great advances in
universal data compression. We then describe nonasymptotic uniform bou
nds on the performance of data-compression algorithms in cases where t
he size of the training data that is available to the encoder is not l
arge enough so as to geld the asymptotic compression: the Shannon entr
opy. We then discuss applications of pattern matching and universal co
mpression to universal prediction, classification, and entropy estimat
ion.