ON THE FINITE-SAMPLE PERFORMANCE OF THE NEAREST-NEIGHBOR CLASSIFIER

Citation
D. Psaltis et al., ON THE FINITE-SAMPLE PERFORMANCE OF THE NEAREST-NEIGHBOR CLASSIFIER, IEEE transactions on information theory, 40(3), 1994, pp. 820-837
Citations number
14
Categorie Soggetti
Information Science & Library Science","Engineering, Eletrical & Electronic
ISSN journal
00189448
Volume
40
Issue
3
Year of publication
1994
Pages
820 - 837
Database
ISI
SICI code
0018-9448(1994)40:3<820:OTFPOT>2.0.ZU;2-6
Abstract
The finite sample performance of a nearest neighbor classifier is anal yzed for a two-class pattern recognition problem. An exact integral ex pression is derived for the m-sample risk R(m) given that a reference m-sample of labeled points is available to the classifier. The statist ical setup assumes that the pattern classes arise in nature with fixed a priori probabilities and that points representing the classes are d rawn from Euclidean n-space according to fixed class-conditional proba bility distributions. The sample is assumed to consist of m independen tly generated class-labeled points. For a family of smooth class-condi tional distributions characterized by asymptotic expansions in general form, it is shown that the m-sample risk R(m) has a complete asymptot ic series expansion [GRAPHICS] where R(infinity) denotes the nearest n eighbor risk in the infinite-sample limit and the coefficients c(k) ar e distribution-dependent constants independent of the sample size m. T his analysis thus provides further analytic validation of Bellman's cu rse of dimensionality. Numerical simulations corroborating the formal results are included, and extensions of the theory discussed. The anal ysis also contains a novel application of Laplace's asymptotic method of integration to a multidimensional integral where the integrand atta ins its maximum on a continuum of points.