G. Proietti et C. Faloutsos, Analysis of range queries and self-spatial join queries on real region datasets stored using an R-tree, IEEE KNOWL, 12(5), 2000, pp. 751-762
Citations number
20
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
In this paper, we study the node distribution of an R-tree storing region d
ata, like, for instance, islands, lakes, or human-inhabited areas. We will
show that real region datasets are packed in an R-tree into minimum boundin
g rectangles (MBRs) whose area distribution follows the same power law, nam
ed REGAL (REGion Area Law), as that for the regions themselves. Moreover, t
hese MBRs are packed in their turn into MBRs following the same law, and so
on iteratively, up to the root of the R-tree. Based on this observation. w
e are able to accurately estimate the search effort for range queries, usin
g a small number of easy-to-retrieve parameters. Furthermore, since our ana
lysis exploits, through a realistic mathematical model, the proximity relat
ions existing among the regions in the dataset, we show how to use our mode
l to predict the selectivity of a self-spatial join query posed on the data
set. Experiments on a variety of real datasets (islands, lakes, human-inhab
ited areas) show that our estimations are accurate, enjoying a geometric av
erage relative error ranging from 22 percent to 32 percent for the search e
ffort of a range query, and from 14 percent to 34 percent for the selectivi
ty of a self-spatial join query. This is significantly better than using a
naive model based on uniformity assumption, which gives rise to a geometric
average relative error up to 270 percent and up to 85 percent for the two
problems, respectively.