Modeling high-dimensional index structures using sampling

Citation
Ca. Lang et Ak. Singh, Modeling high-dimensional index structures using sampling, SIG RECORD, 30(2), 2001, pp. 389-400
Citations number
35
Categorie Soggetti
Computer Science & Engineering
Journal title
SIGMOD RECORD
ISSN journal
01635808 → ACNP
Volume
30
Issue
2
Year of publication
2001
Pages
389 - 400
Database
ISI
SICI code
0163-5808(200106)30:2<389:MHISUS>2.0.ZU;2-R
Abstract
A large number of index structures for high-dimensional data have been prop osed previously. In order to tune and compare such index structures, it is vital to have efficient cost prediction techniques for these structures. Pr evious techniques either assume uniformity of the data or are not applicabl e to high-dimensional data. We propose the use of sampling to predict the n umber of accessed index pages during a query execution. Sampling is indepen dent of the dimensionality and preserves clusters which is important for re presenting skewed data. We present a general model for estimating the index page layout using sampling and show how to compensate for errors. We then give an implemention of our model under restricted memory assumptions and s how that it performs well even under these constraints. Errors are minimal and the overall prediction time is up to two orders of magnitude below the time for building and probing the full index without sampling.