R. Mannhold et al., Multivariate analysis of experimental and computational descriptors of molecular lipophilicity, J COMPUT A, 12(6), 1998, pp. 573-581
Two experimental (log P, R-Mw) and 17 calculation descriptors for molecular
lipophilicity (fragmental, atom-based or based on molecular properties) we
re investigated by multivariate analysis for a database of 159 compounds in
cluding both simple structures as well as more complex drug molecules. Prin
cipal component analysis (PCA) of the entire database exhibits a clustering
of chemical groups; preciseness of clustering corresponds to chemical simi
larity. Thus, diversity searching in databases might effectively be perform
ed by PCA on the basis of calculated log P. The comparative validity check
of experimental and computational procedures by regression analysis and PCA
was performed with a chemically balanced, reduced data set (n = 55) repres
enting 11 chemical groups with 5 members each. Regression of experimental d
escriptors (log P-oct versus R-Mw) proves that chromatographic data, obtain
ed under well-defined experimental conditions, can be used as valid substit
utes for log P. Regression of calculated versus experimental lipophilicity
data shows a superiority of fragmental over atom-based methods and approach
es based on molecular properties, as indicated by correlation coefficients,
slopes and intercepts. In addition, PCA revealed that fragmental methods (
Rekker-type, KOWWIN, KLOGP) sense the compound ranking in log P data to alm
ost the same extent as experimental approaches. For atom-based procedures a
nd CLOGP, both the comparability of absolute values and the sensing of the
compound ranking in the database are slightly less. This trend is more pron
ounced for the methods based on molecular properties, with the exception of
BLOGP.