N. Cesabianchi et al., WORST-CASE QUADRATIC LOSS BOUNDS FOR PREDICTION USING LINEAR FUNCTIONS AND GRADIENT DESCENT, IEEE transactions on neural networks, 7(3), 1996, pp. 604-619
In this paper we study the performance of gradient descent (GD) when a
pplied to the problem of on-line linear prediction in arbitrary inner
product spaces, We prove worst-case bounds on the sum of the squared p
rediction errors under various assumptions concerning the amount of a
priori information about the sequence to predict, The algorithms we us
e are variants and extensions of on-line GD, Whereas our algorithms al
ways predict using linear functions as hypotheses, none of our results
requires the data to be linearly related. In fact, the bounds proved
on the total prediction loss are typically expressed as a function of
the total loss of the best fixed linear predictor with bounded norm, A
ll the upper bounds are tight to within constants, Matching lower boun
ds are provided in some cases. Finally, we apply our results to the pr
oblem of on-line prediction for classes of smooth functions.