ITA
ENG

A TEXT COMPRESSION SCHEME THAT ALLOWS FAST SEARCHING DIRECTLY IN THE COMPRESSED FILE

Authors

MANBER U

Citation

U. Manber, A TEXT COMPRESSION SCHEME THAT ALLOWS FAST SEARCHING DIRECTLY IN THE COMPRESSED FILE, ACM transactions on information systems, 15(2), 1997, pp. 124-136

Citations number

Categorie Soggetti

Information Science & Library Science","Computer Science Information Systems

Journal title

ACM transactions on information systems → ACNP

ISSN journal

10468188

Volume

Issue

Year of publication

1997

Pages

124 - 136

Database

ISI

SICI code

1046-8188(1997)15:2<124:ATCSTA>2.0.ZU;2-O

Abstract

A new text compression scheme is presented in this article. The main p urpose of this scheme is to speed up string matching by searching the compressed file directly. The scheme requires no modification of the s tring-matching algorithm, which is used as a black box; any string-mat ching procedure can be used. Instead, the pattern is modified; only th e outcome of the matching of the modified pattern against the compress ed file is decompressed. Since the compressed file is smaller than the original file, the search is faster both in terms of I/O time and pro cessing time than a search in the original file. For typical text file s, we achieve about 30% reduction of space and slightly less of search time. A 30% space saving is not competitive with good text compressio n schemes, and thus should not be used where space is the predominant concern. The intended applications of this scheme are files that are s earched often, such as catalogs, bibliographic files, and address book s. Such files are typically not compressed, but with this scheme they can remain compressed indefinitely, saving space while allowing faster search at the same time. A particular application to an information r etrieval system that we developed is also discussed.