We analyze the performance of top-down algorithms for decision tree learnin
g, such as those employed by the widely used C4.5 and CART software package
s. Our main result is a proof that such algorithms are boosting algorithms.
By this we mean that if the functions that label the internal nodes of the
decision tree can weakly approximate the unknown target function, then the
top-down algorithms we study will amplify this weaks advantage to build a
tree achieving any desired level of accuracy, The bounds we obtain for this
amplification show an interesting dependence on the splitting criterion us
ed by the top-down algorithm, More precisely, if the functions used to labe
l the internal nodes have error 1/2 - gamma as approximations to the target
function, then for the splitting criteria used by CART and C4.5, trees of
size (1/epsilon)(O(1/gamma 2 epsilon 2)) and (1/epsilon)(O(log(1/epsilon)/g
amma 2)) (respectively) suffice to drive the error below epsilon. Thus (for
example), a small constant advantage over random guessing is amplified to
any larger constant advantage with trees of constant size. For a new splitt
ing criterion suggested by our analysis, the much stronger bound of (1/epsi
lon)(O(1/gamma 2)) which is polynomial in 1/epsilon) is obtained, which is
provably optimal for decision tree algorithms, The differing bounds have a
natured explanation in terms of concavity properties of the splitting crite
rion, The primary contribution of this work is in proving that some popular
and empirically successful heuristics that are base on first principles me
et the criteria of an independently motivated theoretical model. (C) 1999 A
cademic Press.