Kl. Goodner et al., The dangers of creating false classifications due to noise in electronic nose and similar multivariate analyses, SENS ACTU-B, 80(3), 2001, pp. 261-266
Randomly generated data with the error limits of 1-10% along with experimen
tal data was employed to demonstrate the dangers of over-fitting data which
creates artificial differentiation, Analysis of variance (ANOVA), principa
l components analysis (PCA), and discriminant function analysis (DFA) were
employed for the data analysis. In cases, where the ratio of samples to var
iables (features) falls below six, single class systems containing only ran
dom noise and random groupings can be misclassified. into more than a singl
e group when the discriminate techniques are employed. The smaller the grou
p size, the more erroneous classifications are made. Larger sample sizes mi
nimize the random noise and allow the true differences to show. A minimum n
umber of variable (features) should be employed with developing classificat
ion models to avoid over-fitting data. The ratio of data points to variable
s should be at least six to avoid over-fitting classification errors with v
alidation of the model using data points not used in generating the model.
(C) 2001 Elsevier Science B.V. All rights reserved.