REDUCTION OF THE SIZE OF THE LEARNING DATA IN A PROBABILISTIC NEURAL-NETWORK BY HIERARCHICAL-CLUSTERING - APPLICATION TO THE DISCRIMINATIONOF SEEDS BY ARTIFICIAL VISION
Y. Chtioui et al., REDUCTION OF THE SIZE OF THE LEARNING DATA IN A PROBABILISTIC NEURAL-NETWORK BY HIERARCHICAL-CLUSTERING - APPLICATION TO THE DISCRIMINATIONOF SEEDS BY ARTIFICIAL VISION, Chemometrics and intelligent laboratory systems, 35(2), 1996, pp. 175-186
The control of seed batches is necessary before their commercializatio
n. In the present work, we attempted to apply computer vision to this
goal. A pattern recognition system formed by a color image analysis de
vice combined with a neural network classifier was tested on a practic
al problem which consisted of the discrimination between 4 seed specie
s (2 cultivated and 2 adventitious seed species). A probabilistic neur
al network (PNN) was used as a classifier. PNN has many advantages, bu
t it requires the storage of all the learning patterns. The main goal
of this work was the reduction of the learning data in order to decrea
se the memory and time requirements of this kind of network. This was
achieved by reducing the number of both features and learning patterns
. Principal component analysis (PCA) was used for feature extraction.
A small number of relevant components were selected as inputs for the
PNN. A further data reduction was performed by a hierarchical clusteri
ng technique based on reciprocal neighbors (RN). The effects of reduci
ng the training set size on the classification performances of the PNN
were tested. From color images of seeds, seventy-three features (incl
uding size, shape, and textural features) were measured. By considerin
g the sum of their eigenvalues, the 4 first principal components were
selected. The training set size was then reduced by RN from 1600 patte
rns to 1176 patterns after one iteration, and to 543 after 5 iteration
s. Without any reduction of the training set, PNN correctly classified
93.0% and 91.9% of the training and the test sets, respectively. Afte
r 5 reductions, the classification results were 91.9% and 89.1% of the
training and the test sets. The classification results slightly decre
ased after 5 reductions of the training set. It was concluded from sim
ulations that the beneficial effect of reductions is only valid when a
few reductions have been performed, because the classification perfor
mances notably decreased when many iterations of RN were applied. The
combination of PCA and RN (5; iterations) made it possible to reduce t
he learning data to 1.85% of the initial available data, with only a s
light decrease of classification performances.