B. Lucic et al., THE STRUCTURE-PROPERTY MODELS CAN BE IMPROVED USING THE ORTHOGONALIZED DESCRIPTORS, Journal of chemical information and computer sciences, 35(3), 1995, pp. 532-538
Citations number
21
Categorie Soggetti
Information Science & Library Science","Computer Application, Chemistry & Engineering","Computer Science Interdisciplinary Applications",Chemistry,"Computer Science Information Systems
In this report we describe an approach of how one can with the use of
orthogonalized descriptors achieve a better structure-property-activit
y model. This is illustrated using the truncated connectivity basis (l
) chi (l = 0, 1,..., 6). The molecular property used to test the appro
ach was the boiling paints of octanes. We first developed the algorith
m which produces absolutely the best models with I descriptors (I = 1-
7) in nonorthogonalized basis. These models were always better than th
e models that most authors achieve by the use of the stepwise/inclusio
n-exclusion procedure. The next step was the development of the comput
er program by which we could realize all possible orthogonalization or
derings of a given set of I descriptors. In doing that we discovered t
hat the certain orderings of the orthogonalized descriptors lead to mo
dels with higher values of the correlation coefficient (R) than the co
rresponding models with nonorthogonalized descriptors. Because of that
we selected among all the possible orthogonalization orderings (there
are I! possibilities for I descriptors) that ordering which leads to
the descriptor which gives the highest value of R. We call this descri
ptor the dominant descriptor. After we located the first dominant desc
riptor, we have chosen the second dominant descriptor among the remain
ing (I - 1) descriptors following the same procedure. In the identical
way are obtained the third, the fourth, etc. dominant descriptor. In
this manner the selection of the dominant descriptors necessarily mini
mize the contributions of those descriptors which contribute small amo
unts to the total correlation coefficient, because the total R is for
any fixed set of I descriptors constant and independent of the orthogo
nalization order. These descriptors appear to be insignificant and are
removed from the consideration. With this act we only negligibly dimi
nished the total R, but the value of S as well as F-test were signific
antly improved, since we obtained the model with less descriptors.