ACCOUNTING FOR BOUNDARY EFFECTS IN NEAREST-NEIGHBOR SEARCHING

Citation
S. Arya et al., ACCOUNTING FOR BOUNDARY EFFECTS IN NEAREST-NEIGHBOR SEARCHING, Discrete & computational geometry, 16(2), 1996, pp. 155-176
Citations number
15
Categorie Soggetti
Computer Sciences, Special Topics","Mathematics, General","Computer Science Theory & Methods",Mathematics
ISSN journal
01795376
Volume
16
Issue
2
Year of publication
1996
Pages
155 - 176
Database
ISI
SICI code
0179-5376(1996)16:2<155:AFBEIN>2.0.ZU;2-Y
Abstract
Given n data points in d-dimensional space, nearest-neighbor searching involves determining the nearest of these data points to a given quer y point. Most average-case analyses of nearest-neighbor searching algo rithms are made under the simplifying assumption that d is fixed and t hat n is so large relative to d that boundary effects can be ignored. This means that for any query point the statistical distribution of th e data points surrounding it is independent of the location of the que ry point. However, in many applications of nearest-neighbor searching (such as data compression by vector quantization) this assumption is n ot met, since the number of data points n grows roughly as 2(d). Large ly for this reason, the actual performances of many nearest-neighbor a lgorithms tend to be much better than their theoretical analyses would suggest. We present evidence of why this is the case. We provide an a ccurate analysis of the number of cells visited in nearest-neighbor se arching by the bucketing and k-d tree algorithms. We assume m(d) point s uniformly distributed in dimension d, where m is a fixed integer gre ater than or equal to 2. Further, we assume that distances are measure d in the L(infinity) metric. Our analysis is tight in the limit as d a pproaches infinity. Empirical evidence is presented showing that the a nalysis applies even in low dimensions.