AVERAGE PROFILE AND LIMITING DISTRIBUTION FOR A PHRASE SIZE IN THE LEMPEL-ZIV PARSING ALGORITHM

Citation
G. Louchard et W. Szpankowski, AVERAGE PROFILE AND LIMITING DISTRIBUTION FOR A PHRASE SIZE IN THE LEMPEL-ZIV PARSING ALGORITHM, IEEE transactions on information theory, 41(2), 1995, pp. 478-488
Citations number
33
Categorie Soggetti
Information Science & Library Science","Engineering, Eletrical & Electronic
ISSN journal
00189448
Volume
41
Issue
2
Year of publication
1995
Pages
478 - 488
Database
ISI
SICI code
0018-9448(1995)41:2<478:APALDF>2.0.ZU;2-6
Abstract
Consider the parsing algorithm developed by Lempel and Ziv that partit ions a sequence of length n into variable phrases (blocks) such that a new block is the shortest substring not seen in the past as a phrase. In practice, the following parameters are of interest: number of phra ses, the size of a phrase, the number of phrases of given size, and so forth. In this paper, we focus on the size of a randomly selected phr ase, and the average number of phrases of a given size (the so-called average profile of phrase sizes). These parameters can be efficiently analyzed through a digital search tree representation. For a memoryles s source with unequal probabilities of symbols generation (the so-call ed asymmetric Bernoulli model), we prove that the size of a typical ph rase is asymptotically normally distributed with mean and variance exp licitly computed. In terms of digital search trees, we prove the norma l limiting distribution of the typical depth (i.e., the length of a pa th from the root to a randomly selected node). The latter finding is p roved by a technique that belongs to the toolkit of the ''analytical a nalysis of algorithms,'' and it seems to be novel in the context of da ta compression.