THE STRUCTURE-PROPERTY MODELS CAN BE IMPROVED USING THE ORTHOGONALIZED DESCRIPTORS

Citation
B. Lucic et al., THE STRUCTURE-PROPERTY MODELS CAN BE IMPROVED USING THE ORTHOGONALIZED DESCRIPTORS, Journal of chemical information and computer sciences, 35(3), 1995, pp. 532-538
Citations number
21
Categorie Soggetti
Information Science & Library Science","Computer Application, Chemistry & Engineering","Computer Science Interdisciplinary Applications",Chemistry,"Computer Science Information Systems
ISSN journal
00952338
Volume
35
Issue
3
Year of publication
1995
Pages
532 - 538
Database
ISI
SICI code
0095-2338(1995)35:3<532:TSMCBI>2.0.ZU;2-R
Abstract
In this report we describe an approach of how one can with the use of orthogonalized descriptors achieve a better structure-property-activit y model. This is illustrated using the truncated connectivity basis (l ) chi (l = 0, 1,..., 6). The molecular property used to test the appro ach was the boiling paints of octanes. We first developed the algorith m which produces absolutely the best models with I descriptors (I = 1- 7) in nonorthogonalized basis. These models were always better than th e models that most authors achieve by the use of the stepwise/inclusio n-exclusion procedure. The next step was the development of the comput er program by which we could realize all possible orthogonalization or derings of a given set of I descriptors. In doing that we discovered t hat the certain orderings of the orthogonalized descriptors lead to mo dels with higher values of the correlation coefficient (R) than the co rresponding models with nonorthogonalized descriptors. Because of that we selected among all the possible orthogonalization orderings (there are I! possibilities for I descriptors) that ordering which leads to the descriptor which gives the highest value of R. We call this descri ptor the dominant descriptor. After we located the first dominant desc riptor, we have chosen the second dominant descriptor among the remain ing (I - 1) descriptors following the same procedure. In the identical way are obtained the third, the fourth, etc. dominant descriptor. In this manner the selection of the dominant descriptors necessarily mini mize the contributions of those descriptors which contribute small amo unts to the total correlation coefficient, because the total R is for any fixed set of I descriptors constant and independent of the orthogo nalization order. These descriptors appear to be insignificant and are removed from the consideration. With this act we only negligibly dimi nished the total R, but the value of S as well as F-test were signific antly improved, since we obtained the model with less descriptors.