Robust QSAR models from novel descriptors and Bayesian Regularised Neural Networks

Citation
Da. Winkler et Fr. Burden, Robust QSAR models from novel descriptors and Bayesian Regularised Neural Networks, MOL SIMULAT, 24(4-6), 2000, pp. 243
Citations number
53
Categorie Soggetti
Physical Chemistry/Chemical Physics
Journal title
MOLECULAR SIMULATION
ISSN journal
08927022 → ACNP
Volume
24
Issue
4-6
Year of publication
2000
Database
ISI
SICI code
0892-7022(2000)24:4-6<243:RQMFND>2.0.ZU;2-N
Abstract
The QSAR method, using multivariate statistics, was developed by Hansch and Fujita, and it has been successfully applied to many drug and agrochemical design problems. As well as speed and simplicity QSAR has advantages of be ing capable of accounting for some transport and metabolic processes which occur once the compound is administered. Until recently QSAR analyses have used relatively simple molecular descript ors based on substituent constants (e.g., Hammett constants, pi, or molar r efractivities), physicochemical properties (e,g., partition coefficients), topological indices (e.g., Randic and Weiner indices). Recently several new representations have been devised: atomistic; molecular eigenvalues and BC UT indices derived therefrom; E-state fields; topological autocorrelation v ectors; various molecular fragment-based hash codes. These representations have advantages in speed of computation, in more accurately representing mo lecular properties most relevant to activity, or in being more generally ap plicable to diverse chemical classes acting at a common receptor, than trad itional representations. Historically, linear regression methods such as MLR (multiple linear regres sion) and PLS (partial least squares) have been used to develop QSAR models . Regression is an "ill-posed" problem in statistics, which sometimes resul ts in QSAR models exhibiting instability when trained with noisy data. In a ddition traditional regression techniques often require subjective decision s to be made on the part of the investigator as to the likely non-linear re lationship between structure and activity, and whether there are cross-term s. Regression methods based on neural networks offer some advantages over M LR methods as they can account for non-linear SARs, and can deal with linea r dependencies which sometimes appear in real SAR problems. However, some p roblems still exist in the development of SAR models using conventional bac kpropagation neural networks. We have used a specific type of neural network,the Bayesian Regularized Art ificial Neural Network (BRANN), in the development of SAR models. The advan tage of BRANN is that the models are robust and the validation process, whi ch scales as O(N-2) in normal regression methods, is unnecessary. These net works have the potential to solve a number of problems which arise in QSAR modelling such as: choice of model; robustness of model; choice of validati on set; size of validation effort; and optimization of network architecture . The application of the methods to QSAR of compounds active at the benzodi azepine and muscarinic receptors will be illustrated.