We study estimation in the class of stationary variable length Markov chain
s (VLMC) on a finite space. The processes in this class are still Markovian
of high order, but with memory of variable length yielding a much bigger a
nd structurally richer class of models than ordinary high-order Markov chai
ns. From an algorithmic view, the VLMC model class has attracted interest i
n information theory and machine learning, but statistical properties have
not yet been explored. Provided that good estimation is available, the addi
tional structural richness of the model class enhances predictive power by
finding a better trade-off between model bias and variance and allowing bet
ter structural description which can be of specific interest. The latter is
exemplified with some DNA data. A version of the tree-structured context a
lgorithm, proposed by Rissanen in an information theoretical set-up is show
n to have new good asymptotic properties for estimation in the class of VLM
Cs. This remains true even when the underlying model increases in dimension
ality. Furthermore, consistent estimation of minimal state spaces and mixin
g properties of fitted models are given.
We also propose a new bootstrap scheme based on fitted VLMCs. We show its v
alidity for quite general stationary categorical time series and for a broa
d range of statistical procedures.