Natural gradient descent is a principled method for adapting the parameters
of a statistical model on-line using an underlying Riemannian parameter sp
ace to redefine the direction of steepest descent. The algorithm is examine
d via methods of statistical physics that accurately characterize both tran
sient and asymptotic behavior. A solution of the learning dynamics is obtai
ned for the case of multilayer neural network training in the limit of larg
e input dimension. We find that natural gradient learning leads to optimal
asymptotic performance and outperforms gradient descent in the transient, s
ignificantly shortening or even removing plateaus in the transient generali
zation performance that typically hamper gradient descent training.