We consider a class of subgradient methods for minimizing a convex function
that consists of the sum of a large number of component functions. This ty
pe of minimization arises in a dual context from Lagrangian relaxation of t
he coupling constraints of large scale separable problems. The idea is to p
erform the subgradient iteration incrementally, by sequentially taking step
s along the subgradients of the component functions, with intermediate adju
stment of the variables after processing each component function. This incr
emental approach has been very successful in solving large differentiable l
east squares problems, such as those arising in the training of neural netw
orks, and it has resulted in a much better practical rate of convergence th
an the steepest descent method.
In this paper, we establish the convergence properties of a number of varia
nts of incremental subgradient methods, including some that are stochastic.
Based on the analysis and computational experiments, the methods appear ve
ry promising and effective for important classes of large problems. A parti
cularly interesting discovery is that by randomizing the order of selection
of component functions for iteration, the convergence rate is substantiall
y improved.