SPEEDING-UP 2 STRING-MATCHING ALGORITHMS

Citation
M. Crochemore et al., SPEEDING-UP 2 STRING-MATCHING ALGORITHMS, Algorithmica, 12(4-5), 1994, pp. 247-267
Citations number
17
Categorie Soggetti
Computer Sciences",Mathematics,Mathematics,"Computer Science Software Graphycs Programming
Journal title
ISSN journal
01784617
Volume
12
Issue
4-5
Year of publication
1994
Pages
247 - 267
Database
ISI
SICI code
0178-4617(1994)12:4-5<247:S2SA>2.0.ZU;2-F
Abstract
We show how to speed up two string-matching algorithms: the Boyer-Moor e algorithm (BM algorithm), and its version called here the reverse fa ctor algorithm (RF algorithm). The RF algorithm is based on factor gra phs for the reverse of the pattern. The main feature of both algorithm s is that they scan the text fight-to-left from the supposed right pos ition of the pattern. The BM algorithm goes as far as the scanned segm ent (factor) is a suffix of the pattern. The RF algorithm scans while the segment is a factor of the pattern. Both algorithms make a shift o f the pattern, forget the history, and start again. The RF algorithm u sually makes bigger shifts than BM, but is quadratic in the worst case . We show that it is enough to remember the last matched segment (repr esented by two pointers to the text) to speed up the RF algorithm cons iderably (to make a linear number of inspections of text symbols, with small coefficient), and to speed up the BM algorithm (to make at most 2 . n comparisons). Only a constant additional memory is needed for t he search phase. We give alternative versions of an accelerated RF alg orithm: the first one is based on combinatorial properties of primitiv e words, and the other two use the power of suffix trees extensively. The paper demonstrates the techniques to transform algorithms, and als o shows interesting new applications of data structures representing a ll subwords of the pattern in compact form.