On effective multi-dimensional indexing for strings

Citation
Hv. Jagadish et al., On effective multi-dimensional indexing for strings, SIG RECORD, 29(2), 2000, pp. 403-414
Citations number
22
Categorie Soggetti
Computer Science & Engineering
Journal title
SIGMOD RECORD
ISSN journal
01635808 → ACNP
Volume
29
Issue
2
Year of publication
2000
Pages
403 - 414
Database
ISI
SICI code
0163-5808(200006)29:2<403:OEMIFS>2.0.ZU;2-R
Abstract
As databases have expanded in scope from storing purely business data to in clude XML documents, product catalogs, e-mail messages, and directory data, it has become increasingly important to search databases based on wild-car d string matching: prefix matching, for example, is more common land useful ) than exact matching, for such data. In many cases, matches need to be on multiple attributes/dimensions, with correlations between the dimensions. T raditional multi-dimensional index structures, designed with (fixed length) numeric data in mind, are not suitable for matching unbounded length strin g data. In this paper, we describe a general technique for adapting a multi-dimensi onal index structure for wild-card indexing of unbounded length string data . The key ideas are (a) a carefully developed mapping function from strings to rational numbers, (b) representing an unbounded length string in an ind ex leaf page by a fixed length offset-to an external key, and (c) storing m ultiple elided tries, one per dimension, in an index page to prune search d uring traversal of index pages. These basic ideas affect all index algorith ms. In this paper, we present efficient algorithms for different types of s tring matching. While our technique is applicable to a wide range of multidimensional index structures, we instantiate our generic techniques by adapting the a-dimens ional R-tree to string data. We demonstrate the space effectiveness and tim e benefits of using the string R-tree both analytically and experimentally.