Suffix trees on words

Citation
A. Andersson et al., Suffix trees on words, ALGORITHMIC, 23(3), 1999, pp. 246-260
Citations number
18
Categorie Soggetti
Engineering Mathematics
Journal title
ALGORITHMICA
ISSN journal
01784617 → ACNP
Volume
23
Issue
3
Year of publication
1999
Pages
246 - 260
Database
ISI
SICI code
0178-4617(199903)23:3<246:STOW>2.0.ZU;2-X
Abstract
We discuss an intrinsic generalization of the suffix tree, designed to inde x a string of length n which has a natural partitioning into m multicharact er substrings or words. This word suffix tree represents only the m suffixe s that start at word boundaries. These boundaries are determined by delimit ers, whose definition depends on the application. Since traditional suffix tree construction algorithms rely heavily on the f act that all suffixes are inserted, construction of a word suffix tree is n ontrivial, in particular when only O(nl) construction space is allowed. We solve this problem, presenting an algorithm with O(rt) expected running tim e, in general, construction cost is Omega (n) due to the need of scanning t he entire input. In applications that require strict node ordering, an addi tional cost of sorting O(m') characters arises, where m' is the number of d istinct words. In either case, this is a significant improvement over previ ously known solutions. Furthermore, when the alphabet is small, we may assume that the II characte rs in the input string occupy o(n) machine words. We illustrate that this c an allow a word suffix tree to be built in sublinear time.