In this paper, we consider the finite sample properties of prediction error
methods using a quadratic criterion function for system identification. Th
e problem we pose is: How many data points are required to guarantee with h
igh probability that the expected value of the quadratic identification cri
terion is close to its empirical mean value? The sample sizes are obtained
using risk minimization theory which provides uniform probabilistic bounds
on the difference between the expected value of the squared prediction erro
r and its empirical mean evaluated on a finite number of data points. The b
ounds are very general. No assumption is made about the true system belongi
ng to the model class, and the noise sequence is not assumed to be uniforml
y bounded. Further analysis shows that in order to maintain a given bound o
n the deviation, the number of data points needed grows no faster than quad
ratically with the number of parameters for FIR and ARX models.