ON THE OPTIMALITY OF THE SIMPLE BAYESIAN CLASSIFIER UNDER ZERO-ONE LOSS

Citation
P. Domingos et M. Pazzani, ON THE OPTIMALITY OF THE SIMPLE BAYESIAN CLASSIFIER UNDER ZERO-ONE LOSS, Machine learning, 29(2-3), 1997, pp. 103-130
Citations number
37
Journal title
ISSN journal
08856125
Volume
29
Issue
2-3
Year of publication
1997
Pages
103 - 130
Database
ISI
SICI code
0885-6125(1997)29:2-3<103:OTOOTS>2.0.ZU;2-X
Abstract
The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other suf ficient conditions for its optimality exist has so far not been explor ed. Empirical results showing that it performs surprisingly well in ma ny domains containing clear attribute dependences suggest that the ans wer to this question may be positive. This article shows that, althoug h the Bayesian classifier's probability estimates are only optimal und er quadratic loss if the independence assumption holds, the classifier itself can be optimal under zero-one loss (misclassification rate) ev en when this assumption is violated by a wide margin. The region of qu adratic-loss optimality of the Bayesian classifier is in fact a second -order infinitesimal fraction of the region of zero-one optimality. Th is implies that the Bayesian classifier has a much greater range of ap plicability than previously thought. For example, in this article it i s shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption. Further, studies in artificial domains show that it will often outperform more powerful cl assifiers for common training set sizes and numbers of attributes, eve n if its bias is a priori much less appropriate to the domain. This ar ticle's results also imply that detecting attribute dependence is not necessarily the best way to extend the Bayesian classifier, and this i s also verified empirically.