The four most important regression methods are evaluated on very large data
sets: Multiple Linear Regression (MLR), Partial Least Squares (PLS), Artif
icial Neural Network (ANN) and a new concept called "LOCAL" (PLS with selec
tion of a calibration sample subset of the closest neighbours for each samp
le to predict). The Standard Errors of Prediction (SEPs) are statistically
tested and the results show that the regression methods are almost equal an
d that the data matrices are more important than the fitting methods themse
lves. The types of pre-treatments (Multiplicative Scatter Correction, Detre
nd, Standard Normal Variate, derivative etc.) of the spectra are too numero
us to be able to test all the combinations. For each test, the pre-treatmen
t found as the best with the PLS method is fixed for the other ones. The se
cond part of the paper emphasises the importance of the number of samples.
If any agricultural commodity, and probably any kind of product measured by
an NIR instrument, can be considered as a mixture of several constituents,
the databases built by collecting actual samples bringing new information
can reach hundreds, if not thousands, of samples.