Distance browsing in spatial databases

Citation
Gr. Hjaltason et H. Samet, Distance browsing in spatial databases, ACM T DATAB, 24(2), 1999, pp. 265-318
Citations number
55
Categorie Soggetti
Computer Science & Engineering
Journal title
ACM TRANSACTIONS ON DATABASE SYSTEMS
ISSN journal
03625915 → ACNP
Volume
24
Issue
2
Year of publication
1999
Pages
265 - 318
Database
ISI
SICI code
0362-5915(199906)24:2<265:DBISD>2.0.ZU;2-E
Abstract
We compare two different techniques for browsing through a collection of sp atial objects stored in an R-tree spatial data structure on the basis of th eir distances from an arbitrary spatial query object. The conventional appr oach is one that makes use of a k-nearest neighbor algorithm where k is kno wn prior to the invocation of the algorithm. Thus if m. k neighbors are nee ded, the k-nearest neighbor algorithm has to be reinvoked for m neighbors, thereby possibly performing some redundant computations. The second approac h is incremental in the sense that having obtained the k nearest neighbors, the k + 1(st) neighbor can be obtained without having to calculate the k 1 nearest neighbors from scratch. The incremental approach is useful when processing complex queries where one of the conditions involves spatial pro ximity (e.g., the nearest city to Chicago with population greater than a mi llion), in which case a query engine can make use of a pipelined strategy. We present a general incremental nearest neighbor algorithm that is applica ble to a large class of hierarchical spatial data structures. This algorith m is adapted to the R-tree and its performance is compared to an existing k -nearest neighbor algorithm for R-trees [Roussopoulos et al. 1995]. Experim ents show that the incremental nearest neighbor algorithm significantly out performs the k-nearest neighbor algorithm for distance browsing queries in a spatial database that uses the R-tree as a spatial index. Moreover, the i ncremental nearest neighbor algorithm usually outperforms the k-nearest nei ghbor algorithm when applied to the k-nearest neighbor problem for the R-tr ee, although the improvement is not nearly as large as for distance browsin g queries. In fact, we prove informally that at any step in its execution t he incremental nearest neighbor algorithm is optimal with respect to the sp atial data structure that is employed. Furthermore, based on some simplifyi ng assumptions, we prove that in two dimensions the number of distance comp utations and leaf nodes accesses made by the algorithm for finding k neighb ors is O (k + k).