Efficient universal lossless data compression algorithms based on a greedysequential grammar transform - Part one: Without context models

Citation
Eh. Yang et Jc. Kieffer, Efficient universal lossless data compression algorithms based on a greedysequential grammar transform - Part one: Without context models, IEEE INFO T, 46(3), 2000, pp. 755-777
Citations number
35
Categorie Soggetti
Information Tecnology & Communication Systems
Journal title
IEEE TRANSACTIONS ON INFORMATION THEORY
ISSN journal
00189448 → ACNP
Volume
46
Issue
3
Year of publication
2000
Pages
755 - 777
Database
ISI
SICI code
0018-9448(200005)46:3<755:EULDCA>2.0.ZU;2-E
Abstract
A grammar transform is a transformation that converts any data sequence to be compressed into a grammar from which the original data sequence can be f ully reconstructed. In a grammar-based code, a data sequence is first conve rted into a grammar by a grammar transform and then losslessly encoded. In this paper, a greedy grammar transform is first presented; this grammar tra nsform constructs sequentially a sequence of irreducible grammars from whic h the original data sequence can be recovered incrementally. Based on this grammar transform, three universal lossless data compression algorithms, a sequential algorithm, an improved sequential algorithm, and a hierarchical algorithm, are then developed. These algorithms combine the power of arithm etic coding with that of string matching. It is shown that these algorithms are all universal in the sense that they can achieve asymptotically the en tropy rate of any stationary, ergodic source. Moreover, it is proved that t heir worst case redundancies among all individual sequences of length n are upper-bounded by c log log n / log n, where c is a constant. Simulation re sults show that the proposed algorithms outperform the Unix Compress and Gz ip algorithms, which are based on LZ78 and LZ77, respectively.