ITA
ENG

METRICS AND MODELS FOR HANDWRITTEN CHARACTER-RECOGNITION

Authors

HASTIE T SIMARD PY

Citation

T. Hastie et Py. Simard, METRICS AND MODELS FOR HANDWRITTEN CHARACTER-RECOGNITION, Statistical science, 13(1), 1998, pp. 54-65

Citations number

Categorie Soggetti

Statistic & Probability","Statistic & Probability

Journal title

Statistical science → ACNP

ISSN journal

08834237

Volume

Issue

Year of publication

1998

Pages

54 - 65

Database

ISI

SICI code

0883-4237(1998)13:1<54:MAMFHC>2.0.ZU;2-2

Abstract

A digitized handwritten numeral can be represented as a binary or grey scale image. An important pattern recognition task that has received m uch attention lately is to automatically determine the digit, given th e image. While many different techniques have been pushed very hard to solve this task, the most successful and intuitively appropriate is d ue to Simard, Le Cun and Denker (1993). Their approach combined neares t-neighbor classification with a subject-specific invariant metric tha t allows for small rotations, translations and other natural transform ations. We report on Simard's classifier and compare it to other appro aches. One important negative aspect of near-neighbor classification i s that all the work gets done at lookup time, and with around 10,000 t raining images in high dimensions this can be exorbitant. In this pape r we develop rich models for representing large subsets of the prototy pes. One example is a low-dimensional hyperplane defined by a point an d a set of basis or tangent vectors. The components of these models ar e learned from the training set, chosen to minimize the average tangen t distance from a subset of the training images-as such they are simil ar in flavor to the singular value decomposition (SVD), which finds cl osest hyperplanes in Euclidean distance. These models are either used singly per class or used as basic building blocks in conjunction with the K-means clustering algorithm.