ITA
ENG

Search for predictive generic model of aqueous solubility using Bayesian neural nets

Authors

Bruneau, P

Citation

P. Bruneau, Search for predictive generic model of aqueous solubility using Bayesian neural nets, J CHEM INF, 41(6), 2001, pp. 1605-1616

Citations number

Categorie Soggetti

Chemistry

Journal title

JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES

ISSN journal

00952338 → ACNP

Volume

Issue

Year of publication

2001

Pages

1605 - 1616

Database

ISI

SICI code

0095-2338(200111/12)41:6<1605:SFPGMO>2.0.ZU;2-H

Abstract

Several predictive models of aqueous solubility have been published. They h ave good performances on the data sets which have been used for training th e models, but usually these data sets do not contain many structures simila r to the structures of interest to the drug research and their applicabilit y in drug hunting is questionable. A very diverse data set has been gathere d with compounds issued from literature reports and proprietary compounds. These compounds have been grouped in a so-called literature data set I, a p roprietary data set II, and a mixed data set III formed by I and II. About 100 descriptors emphasizing surface properties were calculated for every co mpound. Bayesian learning of neural nets which cumulates the advantages of neural nets without having their weaknesses was used to select the most par simonious models and train them, from I, II, and III. The models were estab lished by either selecting the most efficient descriptors one by one using a modified Gram-Schmidt procedure (GS) or by simplifying a most complete mo del using automatic relevance procedure (ARD). The predictive ability of th e models was accessed using validation data sets as much unrelated to the t raining sets as possible, using two new parameters: NDDx,ref the normalized smallest descriptor distance of a compound x to a reference data set and C Dx,mod the combination of NDDx,ref with the dispersion of the Bayesian neur al nets calculations. The results show that it is possible to obtain a gene ric predictive model from database I but that the diversity of database II is too restricted to give a model with good generalization ability and that the ARD method applied to the mixed database III gives the best predictive model.