Finding approximate answers to multi-dimensional range queries over real va
lued attributes has significant applications in data exploration and databa
se query optimization. In this paper we consider the following problem: giv
en a table of d attributes whose domain is the real numbers, and a query th
at specifies a range in each dimension, find a good approximation of the nu
mber of records in the table that satisfy the query.
We present a new histogram technique that is designed to approximate the de
nsity of multi-dimensional datasets with real attributes. Our technique fin
ds buckets of variable size, and allows the buckets to overlap. Overlapping
buckets allow more efficient approximation of the density. The size of the
cells is based on the local density of the data. This technique leads to a
faster and more compact approximation of the data distribution. We also sh
ow how to generalize kernel density estimators, and how to apply them on th
e multi-dimensional query approximation problem.
Finally, we compare the accuracy of the proposed techniques with existing t
echniques using real and synthetic datasets.