Eh. Yang et Jc. Kieffer, Efficient universal lossless data compression algorithms based on a greedysequential grammar transform - Part one: Without context models, IEEE INFO T, 46(3), 2000, pp. 755-777
A grammar transform is a transformation that converts any data sequence to
be compressed into a grammar from which the original data sequence can be f
ully reconstructed. In a grammar-based code, a data sequence is first conve
rted into a grammar by a grammar transform and then losslessly encoded. In
this paper, a greedy grammar transform is first presented; this grammar tra
nsform constructs sequentially a sequence of irreducible grammars from whic
h the original data sequence can be recovered incrementally. Based on this
grammar transform, three universal lossless data compression algorithms, a
sequential algorithm, an improved sequential algorithm, and a hierarchical
algorithm, are then developed. These algorithms combine the power of arithm
etic coding with that of string matching. It is shown that these algorithms
are all universal in the sense that they can achieve asymptotically the en
tropy rate of any stationary, ergodic source. Moreover, it is proved that t
heir worst case redundancies among all individual sequences of length n are
upper-bounded by c log log n / log n, where c is a constant. Simulation re
sults show that the proposed algorithms outperform the Unix Compress and Gz
ip algorithms, which are based on LZ78 and LZ77, respectively.