ITA
ENG

SPEEDING-UP 2 STRING-MATCHING ALGORITHMS

Authors

CROCHEMORE M CZUMAJ A GASIENIEC L JAROMINEK S LECROQ T PLANDOWSKI W RYTTER W

Citation

M. Crochemore et al., SPEEDING-UP 2 STRING-MATCHING ALGORITHMS, Algorithmica, 12(4-5), 1994, pp. 247-267

Citations number

Categorie Soggetti

Computer Sciences",Mathematics,Mathematics,"Computer Science Software Graphycs Programming

Journal title

Algorithmica → ACNP

ISSN journal

01784617

Volume

Issue

4-5

Year of publication

1994

Pages

247 - 267

Database

ISI

SICI code

0178-4617(1994)12:4-5<247:S2SA>2.0.ZU;2-F

Abstract

We show how to speed up two string-matching algorithms: the Boyer-Moor e algorithm (BM algorithm), and its version called here the reverse fa ctor algorithm (RF algorithm). The RF algorithm is based on factor gra phs for the reverse of the pattern. The main feature of both algorithm s is that they scan the text fight-to-left from the supposed right pos ition of the pattern. The BM algorithm goes as far as the scanned segm ent (factor) is a suffix of the pattern. The RF algorithm scans while the segment is a factor of the pattern. Both algorithms make a shift o f the pattern, forget the history, and start again. The RF algorithm u sually makes bigger shifts than BM, but is quadratic in the worst case . We show that it is enough to remember the last matched segment (repr esented by two pointers to the text) to speed up the RF algorithm cons iderably (to make a linear number of inspections of text symbols, with small coefficient), and to speed up the BM algorithm (to make at most 2 . n comparisons). Only a constant additional memory is needed for t he search phase. We give alternative versions of an accelerated RF alg orithm: the first one is based on combinatorial properties of primitiv e words, and the other two use the power of suffix trees extensively. The paper demonstrates the techniques to transform algorithms, and als o shows interesting new applications of data structures representing a ll subwords of the pattern in compact form.