THE RELATIONSHIP BETWEEN GREEDY PARSING AND SYMBOLWISE TEXT COMPRESSION

Authors
Citation
Tc. Bell et Ih. Witten, THE RELATIONSHIP BETWEEN GREEDY PARSING AND SYMBOLWISE TEXT COMPRESSION, Journal of the Association for Computing Machinery, 41(4), 1994, pp. 708-724
Citations number
43
Categorie Soggetti
Computer Sciences","Computer Science Hardware & Architecture
Journal title
Journal of the Association for Computing Machinery
ISSN journal
00045411 → ACNP
Volume
41
Issue
4
Year of publication
1994
Pages
708 - 724
Database
ISI
SICI code
Abstract
Text compression method can be divided into two classes: symbolwise an d parsing. Symbolwise method assign codes to individual symbols, while parsing methods assign codes to groups of consecutive symbols (phrase s). The set of phrases available to a parsing method is referred to as a dictionary. The vast majority of parsing methods in the literature use greedy parsing (including nearly all variations of the popular Ziv -Lempel methods). When greedy parsing is used, the coder processes a s tring from left to right, at each step encoding as many symbols as pos sible with a phrase from the dictionary. This parsing strategy is not optimal, but an optimal method cannot guarantee a bounded coding delay . An important problem in compression research has been to establish t he relationship between symbolwise methods and parsing methods. This p aper extends prior work that shows that there are symbolwise methods t hat simulate a subset of greedy parsing methods. We provide a more gen eral algorithm that takes any nonadaptive greedy parsing method and co nstructs a symbolwise method that achieves exactly the same compressio n. Combined with the existence of symbolwise equivalents for two of th e most significant adaptive parsing methods, this result gives added w eight to the idea that research aimed at increasing compression should concentrate on symbolwise methods, while parsing methods should be ch osen for speed or temporary storage considerations.