The simple Bayesian classifier is known to be optimal when attributes
are independent given the class, but the question of whether other suf
ficient conditions for its optimality exist has so far not been explor
ed. Empirical results showing that it performs surprisingly well in ma
ny domains containing clear attribute dependences suggest that the ans
wer to this question may be positive. This article shows that, althoug
h the Bayesian classifier's probability estimates are only optimal und
er quadratic loss if the independence assumption holds, the classifier
itself can be optimal under zero-one loss (misclassification rate) ev
en when this assumption is violated by a wide margin. The region of qu
adratic-loss optimality of the Bayesian classifier is in fact a second
-order infinitesimal fraction of the region of zero-one optimality. Th
is implies that the Bayesian classifier has a much greater range of ap
plicability than previously thought. For example, in this article it i
s shown to be optimal for learning conjunctions and disjunctions, even
though they violate the independence assumption. Further, studies in
artificial domains show that it will often outperform more powerful cl
assifiers for common training set sizes and numbers of attributes, eve
n if its bias is a priori much less appropriate to the domain. This ar
ticle's results also imply that detecting attribute dependence is not
necessarily the best way to extend the Bayesian classifier, and this i
s also verified empirically.