We examine linear program (LP) approaches to boosting and demonstrate their
efficient solution using LPBoost, a column generation based simplex method
. We formulate the problem as if all possible weak hypotheses had already b
een generated. The labels produced by the weak hypotheses become the new fe
ature space of the problem. The boosting task becomes to construct a learni
ng function in the label space that minimizes misclassification error and m
aximizes the soft margin. We prove that for classification, minimizing the
1-norm soft margin error function directly optimizes a generalization error
bound. The equivalent linear program can be efficiently solved using colum
n generation techniques developed for large-scale optimization problems. Th
e resulting LPBoost algorithm can be used to solve any LP boosting formulat
ion by iteratively optimizing the dual misclassification costs in a restric
ted LP and dynamically generating weak hypotheses to make new LP columns. W
e provide algorithms for soft margin classification, confidence-rated, and
regression boosting problems. Unlike gradient boosting algorithms, which ma
y converge in the limit only, LPBoost converges in a finite number of itera
tions to a global solution satisfying mathematically well-defined optimalit
y conditions. The optimal solutions of LPBoost are very sparse in contrast
with gradient based methods. Computationally, LPBoost is competitive in qua
lity and computational cost to AdaBoost.