ITA
ENG

Optimal encoding of non-stationary sources

Authors

Reif, JH Storer, JA

Citation

Jh. Reif et Ja. Storer, Optimal encoding of non-stationary sources, INF SCI, 135(1-2), 2001, pp. 87-105

Citations number

Categorie Soggetti

Information Tecnology & Communication Systems

Journal title

INFORMATION SCIENCES

ISSN journal

00200255 → ACNP

Volume

135

Issue

1-2

Year of publication

2001

Pages

87 - 105

Database

ISI

SICI code

0020-0255(200106)135:1-2<87:OEONS>2.0.ZU;2-7

Abstract

The usual assumption for proofs of the optimality of lossless encoding is a stationary ergodic source. Dynamic sources with non-stationary probability distributions occur in many practical situations where the data source is formed from a composition of distinct sources, for example, a document with multiple authors, a multimedia document, or the composition of distinct pa ckets sent over a communication channel. There is a vast literature of adap tive methods used to tailor the compression to dynamic sources. However, li ttle is known about optimal or near optimal methods for lossless compressio n of strings generated by sources that are not stationary ergodic. Here, we do not assume the source is stationary. Instead, we assume that the source produces an infinite sequence of concatenated finite strings s(1)...s(n), where: (i) Each finite string s(i) is generated by a sampling of a (possibly disti nct) stationary ergodic source S-i, and (ii) the length of each of the s(i) is lower bounded by a function L(n) suc h that L(n)/log(n) grows unboundedly with the length n of all the text with in s(1)...s(i). Thus each input string is a sequence of substrings generated by possibly di stinct and unknown stationary ergodic sources. The optimal expected length of a compressed coding of a finite prefix s(1)...s(k) is [GRAPHICS] where n(i) is the length of s(i) and H-i is the entropy of S-i. We give a w indow-based LZ77-type method for compression that we prove gives an encodin g with asymptotically optimal expected length. We give another LZ77-type me thod for compression where the expected time for encoding and decoding is n early linear (approaching arbitrarily close to linear O(12) for large n). W e also prove that this later method gives an encoding with asymptotically o ptimal expected length. In addition, give a dictionary-based LZ78-type meth od for compression, which takes linear time with small constant factors. Th is final algorithm also gives an encoding with asymptotically optimal expec ted length, assuming the S-i are stationary ergodic sources that satisfy ce rtain wiring conditions and L(n) greater than or equal to n(epsilon) for so me epsilon > 0. (C) 2001 Elsevier Science Inc. All rights reserved.