A key concept in pattern recognition is that a pattern recognizer shou
ld be designed so as to minimize the errors it makes in classifying pa
tterns. In this article, we review a recent, promising approach for mi
nimizing the error rale of a classifier and describe a particular appl
ication to a simple, prototype-based speech recognizer. The key idea i
s to define a smooth, differentiable loss function that incorporates a
ll adaptable classifier parameters and that approximates the actual pe
rformance error rate. Gradient descent can then be used to minimize th
is loss. This approach allows but does not require the use of explicit
ly probabilistic models. Furthermore, minimum error training does not
involve the estimation of probability distributions that are difficult
to obtain reliably. This new method has been applied to a variety of
pattern recognition problems, with good results. Here we describe a pa
rticular application in which a relatively simple distance-based class
ifier is trained to minimize errors in speech recognition tasks. The l
oss function is defined so as to reflect errors at the level of the fi
nal, grammar-driven recognition output. Thus, minimization of this los
s directly optimizes the overall system performance.