We study a recent algorithm for fast on-line approximate string matching. T
his is the problem of searching a pattern in a text allowing errors in the
pattern or in the text. The algorithm is based on a very East kernel which
is able to search short patterns using a nondeterministic finite automaton,
which is simulated using bit-parallelism, A number of techniques to extend
this kernel for longer patterns are presented in that work. However, the t
echniques can be integrated in many ways and the optimal interplay among th
em is by no means obvious.
The solution to this problem starts at a very low level, by obtaining basic
probabilistic information about the problem which was not previously known
, and ends integrating analytical results with empirical data to obtain the
optimal heuristic. The conclusions obtained via analysis are experimentall
y confirmed. We also improve many of the techniques and obtain a combined h
euristic which is faster than the original work.
This work shows an excellent example of a complex and theoretical analysis
of algorithms used for design and for practical algorithm engineering, inst
ead of the common practice of first designing an algorithm and then analyzi
ng it.