PPM performance with BWT complexity: A fast and effective data compressionalgorithm

Authors
Citation
M. Effros, PPM performance with BWT complexity: A fast and effective data compressionalgorithm, P IEEE, 88(11), 2000, pp. 1703-1712
Citations number
22
Categorie Soggetti
Eletrical & Eletronics Engineeing
Journal title
PROCEEDINGS OF THE IEEE
ISSN journal
00189219 → ACNP
Volume
88
Issue
11
Year of publication
2000
Pages
1703 - 1712
Database
ISI
SICI code
0018-9219(200011)88:11<1703:PPWBCA>2.0.ZU;2-V
Abstract
This paper introduces a new data compression algorithm. The goal underlying this new code design is to achieve a single lossless lossless compression algorithm with the excellent compensation ratios of the Prediction by Parti al Mapping (PPM) algorithms and the low complexity of codes based on the Bu rrows Wheeler Transform (BWT). Like the BWT-based codes, the proposed algor ithm requires worst case O(n) computational complexity and memory; in contr ast, the unbounded-context PPM algorithm, called PPM*, requires worst case O(n(2)) computational complexity. Like PPM*, the proposed algorthm allows t he use of unbounded contexts. Using standard data sets for comparison, the proposed algorithm achieves compression performance better than proposed al gorithm achieves compression performance better than that of the proposed a lgorithm yields an average rate of 2.29 bits per character (bpc) on the Cal gary corpus; this results compares favorably with the 2.33 and 2.34 bpc of PPM5 and PPM* (PPM algorithms), the 2.43 bpc of BW94 (the original BWT-base d code), and the 3.64 and 2.69 bpc of compress and gzip (popular Unix compr ession algorthms based on Lempel-Ziv (LZ) coding techniques) on the same da ta set. The given code does not, however match the best reported compressio n performance-2.12 bpc with PPMZ9-listed on the Calgary corpus results web page at the time of this publication. Results on the Canterbury corpus give a similar relative standing. The proposed algorithm gives an average rate of 2.15 bpc on the Canterbury corpus, while the Canterbury corpus web page gives an average rates of 1.99 bpc for PPMZ9, 2.11 bpc for PPM5, 2.15 bpc f or PPM7, 2.23 bpc for BZIP2 (a popular BWT-based code), and 3.31 and 2.53 b pc for compress and gzip repectively.