Improving an algorithm for approximate pattern matching

Citation
G. Navarro et R. Baeza-yates, Improving an algorithm for approximate pattern matching, ALGORITHMIC, 30(4), 2001, pp. 473-502
Citations number
33
Categorie Soggetti
Engineering Mathematics
Journal title
ALGORITHMICA
ISSN journal
01784617 → ACNP
Volume
30
Issue
4
Year of publication
2001
Pages
473 - 502
Database
ISI
SICI code
0178-4617(200108)30:4<473:IAAFAP>2.0.ZU;2-4
Abstract
We study a recent algorithm for fast on-line approximate string matching. T his is the problem of searching a pattern in a text allowing errors in the pattern or in the text. The algorithm is based on a very East kernel which is able to search short patterns using a nondeterministic finite automaton, which is simulated using bit-parallelism, A number of techniques to extend this kernel for longer patterns are presented in that work. However, the t echniques can be integrated in many ways and the optimal interplay among th em is by no means obvious. The solution to this problem starts at a very low level, by obtaining basic probabilistic information about the problem which was not previously known , and ends integrating analytical results with empirical data to obtain the optimal heuristic. The conclusions obtained via analysis are experimentall y confirmed. We also improve many of the techniques and obtain a combined h euristic which is faster than the original work. This work shows an excellent example of a complex and theoretical analysis of algorithms used for design and for practical algorithm engineering, inst ead of the common practice of first designing an algorithm and then analyzi ng it.