Cp. Millan et al., EXTRACTION OF REPRESENTATIVE SUBSETS BY POTENTIAL FUNCTIONS METHOD AND GENETIC ALGORITHMS, Chemometrics and intelligent laboratory systems, 40(1), 1998, pp. 33-52
Two procedures are suggested to select a representative subset from a
large data set. The first is based on the use of the estimate of the m
ultivariate probability density distribution by means of the potential
functions technique. The first object selected for the subset is that
for which the probability density is larger. Then, the distribution i
s corrected, by subtraction of the contribution of the selected object
multiplied by a selection factor. The second procedure uses genetic a
lgorithms to individuate the subset that reproduces the variance-covar
iance matrix with the minimum error. Both methods meet the requirement
to obtain a representative subset, but the results obtained with the
method based on potential functions are generally more satisfactory in
the case when the original set is not a random sample from an infinit
e population, but is the finite population itself. Several examples sh
ow how the extraction of a representative subset from a large data set
can give some advantages in the use of representation techniques (i.e
., eigenvector projection, non-linear maps, Kohonen maps) and in class
modelling techniques. (C) 1998 Elsevier Science B.V. All rights reser
ved.