METRICS AND MODELS FOR HANDWRITTEN CHARACTER-RECOGNITION

Citation
T. Hastie et Py. Simard, METRICS AND MODELS FOR HANDWRITTEN CHARACTER-RECOGNITION, Statistical science, 13(1), 1998, pp. 54-65
Citations number
16
Categorie Soggetti
Statistic & Probability","Statistic & Probability
Journal title
ISSN journal
08834237
Volume
13
Issue
1
Year of publication
1998
Pages
54 - 65
Database
ISI
SICI code
0883-4237(1998)13:1<54:MAMFHC>2.0.ZU;2-2
Abstract
A digitized handwritten numeral can be represented as a binary or grey scale image. An important pattern recognition task that has received m uch attention lately is to automatically determine the digit, given th e image. While many different techniques have been pushed very hard to solve this task, the most successful and intuitively appropriate is d ue to Simard, Le Cun and Denker (1993). Their approach combined neares t-neighbor classification with a subject-specific invariant metric tha t allows for small rotations, translations and other natural transform ations. We report on Simard's classifier and compare it to other appro aches. One important negative aspect of near-neighbor classification i s that all the work gets done at lookup time, and with around 10,000 t raining images in high dimensions this can be exorbitant. In this pape r we develop rich models for representing large subsets of the prototy pes. One example is a low-dimensional hyperplane defined by a point an d a set of basis or tangent vectors. The components of these models ar e learned from the training set, chosen to minimize the average tangen t distance from a subset of the training images-as such they are simil ar in flavor to the singular value decomposition (SVD), which finds cl osest hyperplanes in Euclidean distance. These models are either used singly per class or used as basic building blocks in conjunction with the K-means clustering algorithm.