This paper presents a numerical algorithm for finding the bang-bang control
input associated with the time optimal solution of a class of nonlinear dy
namic systems. The proposed algorithm directly searches for the optimal swi
tching instants based on a projected gradient optimization method. It is sh
own that this algorithm can be made into a learning algorithm by using on-l
ine measurements of the state trajectory. The learning is shown to have the
potential for significant robustness to mismatch between the model and the
system. It learns a nearly optimal input through repeated trials in which
it utilizes the measured terminal state error of the actual system and grad
ients based on the theoretical state equation of the system but evaluated a
long the actual state trajectory. The success of the method is demonstrated
on an under-actuated double pendulum system called the acrobot.