ITA
ENG

Modeling high-dimensional index structures using sampling

Authors

Lang, CA Singh, AK

Citation

Ca. Lang et Ak. Singh, Modeling high-dimensional index structures using sampling, SIG RECORD, 30(2), 2001, pp. 389-400

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

SIGMOD RECORD

ISSN journal

01635808 → ACNP

Volume

Issue

Year of publication

2001

Pages

389 - 400

Database

ISI

SICI code

0163-5808(200106)30:2<389:MHISUS>2.0.ZU;2-R

Abstract

A large number of index structures for high-dimensional data have been prop osed previously. In order to tune and compare such index structures, it is vital to have efficient cost prediction techniques for these structures. Pr evious techniques either assume uniformity of the data or are not applicabl e to high-dimensional data. We propose the use of sampling to predict the n umber of accessed index pages during a query execution. Sampling is indepen dent of the dimensionality and preserves clusters which is important for re presenting skewed data. We present a general model for estimating the index page layout using sampling and show how to compensate for errors. We then give an implemention of our model under restricted memory assumptions and s how that it performs well even under these constraints. Errors are minimal and the overall prediction time is up to two orders of magnitude below the time for building and probing the full index without sampling.