G. Louchard et W. Szpankowski, AVERAGE PROFILE AND LIMITING DISTRIBUTION FOR A PHRASE SIZE IN THE LEMPEL-ZIV PARSING ALGORITHM, IEEE transactions on information theory, 41(2), 1995, pp. 478-488
Citations number
33
Categorie Soggetti
Information Science & Library Science","Engineering, Eletrical & Electronic
Consider the parsing algorithm developed by Lempel and Ziv that partit
ions a sequence of length n into variable phrases (blocks) such that a
new block is the shortest substring not seen in the past as a phrase.
In practice, the following parameters are of interest: number of phra
ses, the size of a phrase, the number of phrases of given size, and so
forth. In this paper, we focus on the size of a randomly selected phr
ase, and the average number of phrases of a given size (the so-called
average profile of phrase sizes). These parameters can be efficiently
analyzed through a digital search tree representation. For a memoryles
s source with unequal probabilities of symbols generation (the so-call
ed asymmetric Bernoulli model), we prove that the size of a typical ph
rase is asymptotically normally distributed with mean and variance exp
licitly computed. In terms of digital search trees, we prove the norma
l limiting distribution of the typical depth (i.e., the length of a pa
th from the root to a randomly selected node). The latter finding is p
roved by a technique that belongs to the toolkit of the ''analytical a
nalysis of algorithms,'' and it seems to be novel in the context of da
ta compression.