ITA
ENG

THE RELATIONSHIP BETWEEN GREEDY PARSING AND SYMBOLWISE TEXT COMPRESSION

Authors

BELL TC WITTEN IH

Citation

Tc. Bell et Ih. Witten, THE RELATIONSHIP BETWEEN GREEDY PARSING AND SYMBOLWISE TEXT COMPRESSION, Journal of the Association for Computing Machinery, 41(4), 1994, pp. 708-724

Citations number

Categorie Soggetti

Computer Sciences","Computer Science Hardware & Architecture

Journal title

Journal of the Association for Computing Machinery

ISSN journal

00045411 → ACNP

Volume

Issue

Year of publication

1994

Pages

708 - 724

Database

ISI

SICI code

Abstract

Text compression method can be divided into two classes: symbolwise an d parsing. Symbolwise method assign codes to individual symbols, while parsing methods assign codes to groups of consecutive symbols (phrase s). The set of phrases available to a parsing method is referred to as a dictionary. The vast majority of parsing methods in the literature use greedy parsing (including nearly all variations of the popular Ziv -Lempel methods). When greedy parsing is used, the coder processes a s tring from left to right, at each step encoding as many symbols as pos sible with a phrase from the dictionary. This parsing strategy is not optimal, but an optimal method cannot guarantee a bounded coding delay . An important problem in compression research has been to establish t he relationship between symbolwise methods and parsing methods. This p aper extends prior work that shows that there are symbolwise methods t hat simulate a subset of greedy parsing methods. We provide a more gen eral algorithm that takes any nonadaptive greedy parsing method and co nstructs a symbolwise method that achieves exactly the same compressio n. Combined with the existence of symbolwise equivalents for two of th e most significant adaptive parsing methods, this result gives added w eight to the idea that research aimed at increasing compression should concentrate on symbolwise methods, while parsing methods should be ch osen for speed or temporary storage considerations.