A large number of index structures for high-dimensional data have been prop
osed previously. In order to tune and compare such index structures, it is
vital to have efficient cost prediction techniques for these structures. Pr
evious techniques either assume uniformity of the data or are not applicabl
e to high-dimensional data. We propose the use of sampling to predict the n
umber of accessed index pages during a query execution. Sampling is indepen
dent of the dimensionality and preserves clusters which is important for re
presenting skewed data. We present a general model for estimating the index
page layout using sampling and show how to compensate for errors. We then
give an implemention of our model under restricted memory assumptions and s
how that it performs well even under these constraints. Errors are minimal
and the overall prediction time is up to two orders of magnitude below the
time for building and probing the full index without sampling.