ITA
ENG

THE SAMPLE COMPLEXITY OF PATTERN-CLASSIFICATION WITH NEURAL NETWORKS - THE SIZE OF THE WEIGHTS IS MORE IMPORTANT THAN THE SIZE OF THE NETWORK

Authors

BARTLETT PL

Citation

Pl. Bartlett, THE SAMPLE COMPLEXITY OF PATTERN-CLASSIFICATION WITH NEURAL NETWORKS - THE SIZE OF THE WEIGHTS IS MORE IMPORTANT THAN THE SIZE OF THE NETWORK, IEEE transactions on information theory, 44(2), 1998, pp. 525-536

Citations number

Categorie Soggetti

Computer Science Information Systems","Engineering, Eletrical & Electronic","Computer Science Information Systems

Journal title

IEEE transactions on information theory → ACNP

ISSN journal

00189448

Volume

Issue

Year of publication

1998

Pages

525 - 536

Database

ISI

SICI code

0018-9448(1998)44:2<525:TSCOPW>2.0.ZU;2-K

Abstract

Sample complexity results from computational learning theory, when app lied to neural network learning for pattern classification problems, s uggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network, Results in this paper show that if a large neural network is used for a pattern classification problem and the le arning algorithm finds a network with small weights that has small squ ared error on the training patterns, then the generalization performan ce depends on the size of the weights rather than the number of weight s, For example, consider a two-layer feedforward network of sigmoid un its, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n, We show that the misclassification probability is no more than a certain error esti mate (that is related to squared error on the training set) plus A(3) root(log n)/m (ignoring log A and log m factors), where m is the numbe r of training patterns, This may explain the generalization performanc e of neural networks, particularly when the number of training example s is considerably smaller than the number of weights, It also supports heuristics (such as weight decay and early stopping) that attempt to keep the weights small during training, The proof techniques appear to be useful for the analysis of other pattern classifiers: when the inp ut domain is a totally bounded metric space, we use the same approach to give upper bounds on misclassification probability for classifiers with decision boundaries that are far from the training examples.