L. Colussi et L. Toniolo, HOW THE CHARACTER COMPARISON ORDER SHAPES THE SHIFT FUNCTION OF ONLINE PATTERN-MATCHING ALGORITHMS, Theoretical computer science, 163(1-2), 1996, pp. 117-144
Citations number
27
Categorie Soggetti
Computer Sciences","Computer Science Theory & Methods
String matching is the problem of finding all occurrences of a string
W[0...m - 1] of length m called a pattern, in a longer string J[0...n
- 1] of length n called a text. Several string matching algorithms hav
e been designed to solve the problem in linear time; most of them work
in two steps, called pattern preprocessing and text search step. The
paper addresses the definition and computation of the shift function i
n the pattern preprocessing step of on-line string matching algorithms
. The shift function depends essentially on the order the pattern char
acters are compared with the corresponding text characters. We conside
r a family F of algorithms that do not change the character comparison
order J during execution and we present a uniform definition of shift
function delta(J) for such algorithms via a function imin(J). The def
inition allows one to compute delta(J) in O(m log log m) time in the w
orst case, given imin(J), but sufficient conditions to compute delta(J
) in O(m) time are provided. Computing imin(J) requires O(m(2)) compar
isons in general. We introduce the class of compact orders (which is t
he generalization of Knuth-Morris-Pratt, Boyer-Moore and Crochemore-Pe
rrin character comparison orders) and we give algorithms to compute bo
th function imin(J) and shift function delta(J) in O(m) time for all c
ompact orders. We show that given the order J and the pattern W there
exists a set C of equivalent orders such that the function imin(K) can
be computed in linear time given imin(J) for all orders K is an eleme
nt of C. Moreover, we characterize two orders in the set C that respec
tively minimize and maximize the values of the shift function and we s
how that for both those orders the shift function can be computed in l
inear time given imin(J).