CLUSTER-BASED DESIGN IN ENVIRONMENTAL QSAR

Citation
L. Eriksson et al., CLUSTER-BASED DESIGN IN ENVIRONMENTAL QSAR, Quantitative structure-activity relationships, 16(5), 1997, pp. 383-390
Citations number
28
ISSN journal
09318771
Volume
16
Issue
5
Year of publication
1997
Pages
383 - 390
Database
ISI
SICI code
0931-8771(1997)16:5<383:CDIEQ>2.0.ZU;2-Z
Abstract
In QSAR analysis in environmental sciences adverse effects of chemical s released to the environment are modelled and predicted as a function of the chemical properties of the pollutants. Usually, the set of com pounds under study contains several classes of substances, i.e., a mor e or less strongly clustered set. It is then needed to ensure that the selected training set comprises compounds representing all those chem ical classes. Multivariate design in the principal properties of the c ompound classes is usually appropriate for selecting a meaningful trai ning set. However, with clustered data, often seen in environmental ch emistry and toxicology, a single multivariate design may be suboptimal . This because of the risk of ignoring small classes with few members and only selecting training set compounds from the largest classes. In this paper, a procedure for training set selection recognizing cluste ring is proposed. Here, when non-selective biological or environmental responses are modelled, local multivariate designs are constructed wi thin each cluster (class). The chosen compounds arising from the local designs are finally united in the overall training set, which thus wi ll contain members from all clusters. Our illustration deals with a se t of 66 compounds, categorized into five classes, for which the soil s orption coefficient is available. The training set selection is discus sed, followed by multivariate QSAR modelling, model validation and int erpretation, and predictions for the test set.