Iv. Tetko et al., Prediction of n-octanol/water partition coefficients from PHYSPROP database using artificial neural networks and E-state indices, J CHEM INF, 41(5), 2001, pp. 1407-1421
Citations number
53
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
A new method, ALOGPS v 2.0 (http://www.lnh.unil.ch/similar to itetko/logp/)
, for the assessment of n-octanol/ water partition coefficient, log P, was
developed on the basis of neural network ensemble analysis of 12 908 organi
c compounds available from PHYSPROP database of Syracuse Research Corporati
on. The atom and bond-type E-state indices as well as the number of hydroge
n and non-hydrogen atoms were used to represent the molecular structures. A
preliminary selection of indices was performed by multiple linear regressi
on analysis, and 75 input parameters were chosen. Some of the parameters co
mbined several atom-type or bond-type indices with similar physicochemical
properties. The neural network ensemble training was performed by efficient
partition algorithm developed by the authors. The ensemble contained 50 ne
ural networks, and each neural network had 10 neurons in one hidden layer.
The prediction ability of the developed approach was estimated using both l
eave-one-out (LOO) technique and training/test protocol. In case of interse
ries predictions, i.e., when molecules in the test and in the training subs
ets were selected by chance from the same set of compounds, both approaches
provided similar results. ALOGPS performance was significantly better than
the results obtained by other tested methods. For a subset of 12 777 molec
ules the LOO results, namely correlation coefficient r(2) = 0.95, root mean
squared error, RMSE = 0.39, and an absolute mean error, MAE = 0.29, were c
alculated. For two cross-series predictions, i.e., when molecules in the tr
aining and in the test sets belong to different series of compounds, all an
alyzed methods performed less efficiently. The decrease in the performance
could be explained by a different diversity of molecules in the training an
d in the test sets. However, even for such difficult cases the ALOGPS metho
d provided better prediction ability than the other tested methods. We have
shown that the diversity of the training sets rather than the design of th
e methods is the main factor determining their prediction ability for new d
ata. A comparative performance of the methods as well as a dependence on th
e number of non-hydrogen atoms in a molecule is also presented.