Isovanillyl derivatives constitute a large class of sweet compounds in whic
h there is a high degree of structural similarity and a wide range of biolo
gical activity, the relative sweetness RS spanning from 50 to 10000 times w
ith respect to sucrose. This paper describes the results obtained by applyi
ng statistical models to develop QSARs for these derivatives. For a set of
14 compounds (set 1) appropriate physicochemical parameters for regression
equations were selected using the genetic algorithm method. The best equati
on indicates a very close correlation (N = 14, ND = 5, r(2) = 0.982, r(cv)(
2), = 0.942, LOF = 0.074, PRESS = 0.271, S-PRESS = 0.184, S-DEP = 0.139). G
ood results have also been obtained by Molecular Field Analysis (MFA) appli
ed to the same set of compounds (N = 14, ND = 4, r(2) = 0.957, r(cv)(2) = 0
.925, LOF = 0.044, PRESS = 0.348, S-PRESS = 0.196, S-DEP = 0.158) QSARS hav
e also been derived for a larger set of 41 compounds (set 2, including set
1, plus other 27 compounds) with a much larger variety of structural types.
These compounds have been divided into a training set of 35 compounds and
a test set of 6 compounds. The most significant QSAR obtained using physico
chemical parameters (N = 35, ND = 6, r(2) = 0.673, r(cv)(2) = 0.522, LOF 0.
337, PRESS = 7.432, S-PRESS = 0.515, S-DEP = 0.461) proved less successful
than one using MFA parameters (N = 35, ND = 6, r(2) = 0.746, r(cv)(2) = 0.6
07, LOF 0.261, PRESS = 6.110, S-PRESS = 0.467, S-DEP = 0.418). PRESS values
for the test set were 4.079 and 1.962 respectively showing that the MFA da
ta had more predictive power. Equations with different numbers of descripto
rs were compared and it was concluded that the LOF which is dependent upon
the number of parameters used as well as the sum of squares is a suitable m
easure of equation quality. These equations were also validated by scrambli
ng the experimental data which gave significantly worse agreement than the
real data except when an excessive number of descriptors was used.