A. Farago et G. Lugosi, STRONG UNIVERSAL CONSISTENCY OF NEURAL-NETWORK CLASSIFIERS, IEEE transactions on information theory, 39(4), 1993, pp. 1146-1151
In statistical pattern recognition a classifier is called universally
consistent if its error probability converges to the Bayes-risk as the
size of the training data grows, for all possible distributions of th
e random variable pair of the observation vector and its class. It is
proven that if a one layered neural network with properly chosen numbe
r of nodes is trained to minimize the empirical risk on the training d
ata, then it results in a universally consistent classifier. It is sho
wn that the exponent in the rate of convergence does not depend on the
dimension if certain smoothness conditions on the distribution are sa
tisfied. That is, this class of universally consistent classifiers doe
s not suffer from the ''curse of dimensionality.'' A training algorith
m is also presented that finds the optimal set of parameters in polyno
mial time if the number of nodes and the space dimension is fixed and
the amount of training data grows.