ITA
ENG

Improved generalization through explicit optimization of margins

Authors

Mason, L Bartlett, PL Baxter, J

Citation

L. Mason et al., Improved generalization through explicit optimization of margins, MACH LEARN, 38(3), 2000, pp. 243-255

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

MACHINE LEARNING

ISSN journal

08856125 → ACNP

Volume

Issue

Year of publication

2000

Pages

243 - 255

Database

ISI

SICI code

0885-6125(200003)38:3<243:IGTEOO>2.0.ZU;2-3

Abstract

Recent theoretical results have shown that the generalization performance o f thresholded convex combinations of base classifiers is greatly improved i f the underlying convex combination has large margins on the training data (i.e., correct examples are classified well away from the decision boundary ). Neural network algorithms and AdaBoost have been shown to implicitly max imize margins, thus providing some theoretical justification for their rema rkably good generalization performance. In this paper we are concerned with maximizing the margin explicitly. In particular, we prove a theorem boundi ng the generalization performance of convex combinations in terms of genera l cost functions of the margin, in contrast to previous results, which were stated in terms of the particular cost function sgn(theta - margin). We th en present a new algorithm, DOOM, for directly optimizing a piecewise-linea r family of cost functions satisfying the conditions of the theorem. Experi ments on several of the datasets in the UC Irvine database are presented in which AdaBoost was used to generate a set of base classifiers and then DOO M was used to find the optimal convex combination of those classifiers. In all but one case the convex combination generated by DOOM had lower test er ror than AdaBoost's combination. In many cases DOOM achieves these lower te st errors by sacrificing training error, in the interests of reducing the n ew cost function. In our experiments the margin plots suggest that the size of the minimum margin is not the critical factor in determining generaliza tion performance.