The sequential minimal optimization algorithm (SMO) has been shown to be an
effective method for training support vector machines (SVMs) on classifica
tion tasks defined on sparse data sets. SMO differs from most SVM algorithm
s in that it does not require a quadratic programming solver. In this work,
we generalize SMO so that it can handle regression problems. However, one
problem with SMO is that its rate of convergence slows down dramatically wh
en data is non-sparse and when there are many support vectors in the soluti
on-as is often the case in regression-because kernel function evaluations t
end to dominate the runtime in this case. Moreover, caching kernel function
outputs can easily degrade SMO's performance even more because SMO tends t
o access kernel function outputs in an unstructured manner. We address thes
e problems with several modifications that enable caching to be effectively
used with SMO. For regression problems, our modifications improve converge
nce time by over an order of magnitude.