ITA
ENG

LET SLEEPING FILES LIE - PATTERN-MATCHING IN Z-COMPRESSED FILES

Authors

AMIR A BENSON G FARACH M

Citation

A. Amir et al., LET SLEEPING FILES LIE - PATTERN-MATCHING IN Z-COMPRESSED FILES, Journal of computer and system sciences, 52(2), 1996, pp. 299-307

Citations number

Categorie Soggetti

System Science","Computer Science Hardware & Architecture","Computer Science Theory & Methods

Journal title

Journal of computer and system sciences → ACNP

ISSN journal

00220000

Volume

Issue

Year of publication

1996

Pages

299 - 307

Database

ISI

SICI code

0022-0000(1996)52:2<299:LSFL-P>2.0.ZU;2-L

Abstract

The current explosion of stored information necessitates a new model o f pattern matching, that of compressed matching. In this model one tri es to find all occurrences of a pattern in a compressed text in time p roportional to the compressed text size, i.e., without decompressing t he text. The most effective general purpose compression algorithms are adaptive, in that the text represented by each compression symbol is determined dynamically by the data. As a result, the encoding of a sub string depends on its location. Thus the same substring may ''look dif ferent'' every time it appears in the compressed text. In this paper w e consider pattern matching without decompression in the UNIX Z-compre ssion. This is a variant of the Lempel-Ziv adaptive compression scheme . If n is the length of the compressed text and m is the length of the pattern, our algorithms find the first pattern occurrence in time O(n + m(2)) or O(n log m + m). We also introduce a new criterion to measu re compressed matching algorithms, that of extra space. We show how to modify our algorithms to achieve a trade-off between the amount of ex tra space used and the algorithm's time complexity. (C) 1996 Academic Press, Inc.