ITA
ENG

PPM performance with BWT complexity: A fast and effective data compressionalgorithm

Authors

Effros, M

Citation

M. Effros, PPM performance with BWT complexity: A fast and effective data compressionalgorithm, P IEEE, 88(11), 2000, pp. 1703-1712

Citations number

Categorie Soggetti

Eletrical & Eletronics Engineeing

Journal title

PROCEEDINGS OF THE IEEE

ISSN journal

00189219 → ACNP

Volume

Issue

Year of publication

2000

Pages

1703 - 1712

Database

ISI

SICI code

0018-9219(200011)88:11<1703:PPWBCA>2.0.ZU;2-V

Abstract

This paper introduces a new data compression algorithm. The goal underlying this new code design is to achieve a single lossless lossless compression algorithm with the excellent compensation ratios of the Prediction by Parti al Mapping (PPM) algorithms and the low complexity of codes based on the Bu rrows Wheeler Transform (BWT). Like the BWT-based codes, the proposed algor ithm requires worst case O(n) computational complexity and memory; in contr ast, the unbounded-context PPM algorithm, called PPM*, requires worst case O(n(2)) computational complexity. Like PPM*, the proposed algorthm allows t he use of unbounded contexts. Using standard data sets for comparison, the proposed algorithm achieves compression performance better than proposed al gorithm achieves compression performance better than that of the proposed a lgorithm yields an average rate of 2.29 bits per character (bpc) on the Cal gary corpus; this results compares favorably with the 2.33 and 2.34 bpc of PPM5 and PPM* (PPM algorithms), the 2.43 bpc of BW94 (the original BWT-base d code), and the 3.64 and 2.69 bpc of compress and gzip (popular Unix compr ession algorthms based on Lempel-Ziv (LZ) coding techniques) on the same da ta set. The given code does not, however match the best reported compressio n performance-2.12 bpc with PPMZ9-listed on the Calgary corpus results web page at the time of this publication. Results on the Canterbury corpus give a similar relative standing. The proposed algorithm gives an average rate of 2.15 bpc on the Canterbury corpus, while the Canterbury corpus web page gives an average rates of 1.99 bpc for PPMZ9, 2.11 bpc for PPM5, 2.15 bpc f or PPM7, 2.23 bpc for BZIP2 (a popular BWT-based code), and 3.31 and 2.53 b pc for compress and gzip repectively.