LET SLEEPING FILES LIE - PATTERN-MATCHING IN Z-COMPRESSED FILES

Citation
A. Amir et al., LET SLEEPING FILES LIE - PATTERN-MATCHING IN Z-COMPRESSED FILES, Journal of computer and system sciences, 52(2), 1996, pp. 299-307
Citations number
12
Categorie Soggetti
System Science","Computer Science Hardware & Architecture","Computer Science Theory & Methods
ISSN journal
00220000
Volume
52
Issue
2
Year of publication
1996
Pages
299 - 307
Database
ISI
SICI code
0022-0000(1996)52:2<299:LSFL-P>2.0.ZU;2-L
Abstract
The current explosion of stored information necessitates a new model o f pattern matching, that of compressed matching. In this model one tri es to find all occurrences of a pattern in a compressed text in time p roportional to the compressed text size, i.e., without decompressing t he text. The most effective general purpose compression algorithms are adaptive, in that the text represented by each compression symbol is determined dynamically by the data. As a result, the encoding of a sub string depends on its location. Thus the same substring may ''look dif ferent'' every time it appears in the compressed text. In this paper w e consider pattern matching without decompression in the UNIX Z-compre ssion. This is a variant of the Lempel-Ziv adaptive compression scheme . If n is the length of the compressed text and m is the length of the pattern, our algorithms find the first pattern occurrence in time O(n + m(2)) or O(n log m + m). We also introduce a new criterion to measu re compressed matching algorithms, that of extra space. We show how to modify our algorithms to achieve a trade-off between the amount of ex tra space used and the algorithm's time complexity. (C) 1996 Academic Press, Inc.