Z. Ramadan et al., Variable selection in classification of environmental soil samples for partial least square and neural network models, ANALYT CHIM, 446(1-2), 2001, pp. 233-244
Two variable selection methods were evaluated by comparing their prediction
s with respect to differentiating among environmental soil samples. The foc
us of this work is to determine which input variables are most relevant for
prediction of soil sources using discriminant partial least square (D-PLS)
and back-propagation artificial neural network (BP-ANN) models. The method
s investigated were stepwise variable selection method and genetic algorith
ms (GAs). Microbial community DNA was extracted from 48 environmental soil
samples derived, from different field crops and soil sources. After amplifi
cation of bacterial ribosomal RNA genes by polymerase chain reaction (PCR),
the products were separated by gel electrophoresis. Characteristic complex
band patterns were obtained, indicating high bacterial diversity. Two hund
red and twenty-three, DNA band patterns produced in the gels of the soil sa
mples were used in the analysis, after removal of included DNA standard mar
kers. Based on the brightness of the bands, densitometric curves of the sel
ected DNA band pattern were extracted from the gel images. The curves were
smoothed using Savitsky-Golay method band scaled to the DNA standard marker
s. The prediction results based on the two variable selection methods for P
LS and ANN models are presented and compared. Both methods gave good result
s before any variable selection methods, with the ANN being better than D-P
LS. The prediction performance of both methods specially the D-PLS were imp
roved by applying the stepwise variable selection and the GA variable selec
tion method. The study also shows that GA variable selection had a signific
ant improvement of the predictive ability than the stepwise variable select
ion method. (C) 2001 Elsevier Science B.V. All rights reserved.