We propose and analyse a populational version of stepwise gradient des
cent suitable for a wide range of learning problems. The algorithm is
motivated by genetic algorithms which update a population of solutions
rather than just a single representative as is typical for gradient d
escent. This modification of traditional gradient descent (as used, fo
r example, in the backpropogation algorithm) avoids getting trapped in
local minima. We use an averaging analysis of the algorithm to relate
its behaviour to an associated ordinary differential equation. We der
ive a result concerning how long one has to wait in order that, with a
given high probability, the algorithm is within a certain neighbourho
od of the global minimum. We also analyse the effect of different popu
lation sizes. An example is presented which corroborates our theory ve
ry well.