FAST SCREENING OF LARGE DATABASES USING CLUSTERING AND PCA BASED ON STRUCTURE FRAGMENTS

Citation
J. Nouwen et al., FAST SCREENING OF LARGE DATABASES USING CLUSTERING AND PCA BASED ON STRUCTURE FRAGMENTS, Journal of chemometrics, 10(5-6), 1996, pp. 385-398
Citations number
19
Categorie Soggetti
Chemistry Analytical","Statistic & Probability
Journal title
ISSN journal
08869383
Volume
10
Issue
5-6
Year of publication
1996
Pages
385 - 398
Database
ISI
SICI code
0886-9383(1996)10:5-6<385:FSOLDU>2.0.ZU;2-F
Abstract
Jarvis-Patrick clustering based on structural fragments with the Tanim oto coefficient as the similarity measure provides a fast tool for cla ssification of large amounts of chemicals. This clustering technique w as applied to chemicals in relation to their acute fish toxicity (LC50 ). Correlation analysis with log LC50 as the response variable and log K-ow as the predictor variable resulted in good models for several cl usters. Benzylic chemicals were not recognized as separate clusters. I ncluding them in the training set resulted in models without any predi ctive capability. Based on statistical and chemical criteria,they were rejected, improving the final model substantially. The toxicological response of phenols and some organophosphates was found to fit well in to one model. The clustering resulted in smaller groupings than those listed by Verhaar et al. but were only in dispute for a minority of ch emicals. PCA allowed a quick visual inspection of the application limi ts of the models for the HPVCs and the EINECS. The models performed we ll for the HPVCs but could only be used to estimate a fraction of the EINECS. PCA showed that in some cases subclusters were present. (C) 19 96 by John Wiley & Sons, Ltd.