H. Ney, ON THE PROBABILISTIC-INTERPRETATION OF NEURAL-NETWORK CLASSIFIERS ANDDISCRIMINATIVE TRAINING CRITERIA, IEEE transactions on pattern analysis and machine intelligence, 17(2), 1995, pp. 107-119
A probabilistic interpretation is presented for two important issues i
n neural network based classification, namely the interpretation of di
scriminative training criteria and the neural network outputs as well
as the interpretation of the structure of the neural network. The prob
lem of finding a suitable structure of the neural network can be linke
d to a number of well established techniques in statistical pattern re
cognition, such as the method of potential functions, kernel densities
, and continuous mixture densities. Discriminative training of mural n
etwork outputs amounts to approximating the class or posterior probabi
lities of the classical statistical approach. This paper extends these
links by introducing and analyzing novel criteria such as maximizing
the class probability and minimizing the smoothed error rate. These cr
iteria are defined in the framework of class-conditional probability d
ensity functions. We will show that these criteria can be interpreted
in terms of weighted maximum likelihood estimation, where the weights
depend in a complicated nonlinear fashion on the model parameters to b
e trained. In particular, this approach covers widely used techniques
such as corrective training, learning vector quantization, and linear
discriminant analysis.