MATCHING A SET OF STRINGS WITH VARIABLE-LENGTH DONT CARES

Citation
G. Kucherov et M. Rusinowitch, MATCHING A SET OF STRINGS WITH VARIABLE-LENGTH DONT CARES, Theoretical computer science, 178(1-2), 1997, pp. 129-154
Citations number
17
Categorie Soggetti
Computer Sciences","Computer Science Theory & Methods
ISSN journal
03043975
Volume
178
Issue
1-2
Year of publication
1997
Pages
129 - 154
Database
ISI
SICI code
0304-3975(1997)178:1-2<129:MASOSW>2.0.ZU;2-A
Abstract
Given an alphabet A, a pattern p is a word v(1)@...@v(m), where v(i) e psilon A and @ is not an element of A is a distinguished symbol calle d a variable length don't care symbol. Pattern p matches a text t epsi lon A if t = u(O)v(1)u(l)...u(m-1)v(m)u(m) for some u(O),..., u(m) ep silon A. We address the following problem: given a set P of patterns and a text t, test whether one of the patterns of P matches t. We desc ribe an algorithm that serves the problem in time O((\t\+\P\)log\P\). In contrast to most of the existing string matching algorithms (such a s that of Aho-Corasick) our algorithm is not composed of two successiv e stages - preprocessing the pattern (resp. the text) and reading thro ugh the text (reap. the pattern) - but has these two stages essentiall y interleaved. Our approach is based on using the DAWG (Directed Acycl ic Word Graph), a data structure studied by A. Blumer J. Blumer, Hauss ler, Ehrenfeucht, Crochemore, Chen, Seiferas.