As databases have expanded in scope from storing purely business data to in
clude XML documents, product catalogs, e-mail messages, and directory data,
it has become increasingly important to search databases based on wild-car
d string matching: prefix matching, for example, is more common land useful
) than exact matching, for such data. In many cases, matches need to be on
multiple attributes/dimensions, with correlations between the dimensions. T
raditional multi-dimensional index structures, designed with (fixed length)
numeric data in mind, are not suitable for matching unbounded length strin
g data.
In this paper, we describe a general technique for adapting a multi-dimensi
onal index structure for wild-card indexing of unbounded length string data
. The key ideas are (a) a carefully developed mapping function from strings
to rational numbers, (b) representing an unbounded length string in an ind
ex leaf page by a fixed length offset-to an external key, and (c) storing m
ultiple elided tries, one per dimension, in an index page to prune search d
uring traversal of index pages. These basic ideas affect all index algorith
ms. In this paper, we present efficient algorithms for different types of s
tring matching.
While our technique is applicable to a wide range of multidimensional index
structures, we instantiate our generic techniques by adapting the a-dimens
ional R-tree to string data. We demonstrate the space effectiveness and tim
e benefits of using the string R-tree both analytically and experimentally.