Fast practical multi-pattern matching

Citation
M. Crochemore et al., Fast practical multi-pattern matching, INF PROCESS, 71(3-4), 1999, pp. 107-113
Citations number
16
Categorie Soggetti
Information Tecnology & Communication Systems
Journal title
INFORMATION PROCESSING LETTERS
ISSN journal
00200190 → ACNP
Volume
71
Issue
3-4
Year of publication
1999
Pages
107 - 113
Database
ISI
SICI code
0020-0190(19990827)71:3-4<107:FPMM>2.0.ZU;2-8
Abstract
The multi-pattern matching problem consists in finding all occurrences of t he patterns from a finite set X in a given text T of length n, We present a new and simple algorithm combining the ideas of the Aho-Corasick algorithm and the directed acyclic word graphs. The algorithm has time complexity wh ich is linear in the worst case (it makes at most 2n symbol comparisons) an d has good average-case time complexity assuming the shortest pattern is su fficiently long. Denote the length of the shortest pattern by m, and the to tal length of all patterns by M. Assume that M is polynomial with respect t o m, the alphabet contains at least 2 symbols and the text (in which the pa ttern is to be found) is random, for each position each letter occurs indep endently with the same probability. Then the average number of comparisons is O((n/m) . log m), which matches the lower bound of the problem. For suff iciently large values of m the algorithm has a good behavior in practice. ( C) 1999 Elsevier Science B.V. All rights reserved.