J. Nouwen et al., FAST SCREENING OF LARGE DATABASES USING CLUSTERING AND PCA BASED ON STRUCTURE FRAGMENTS, Journal of chemometrics, 10(5-6), 1996, pp. 385-398
Jarvis-Patrick clustering based on structural fragments with the Tanim
oto coefficient as the similarity measure provides a fast tool for cla
ssification of large amounts of chemicals. This clustering technique w
as applied to chemicals in relation to their acute fish toxicity (LC50
). Correlation analysis with log LC50 as the response variable and log
K-ow as the predictor variable resulted in good models for several cl
usters. Benzylic chemicals were not recognized as separate clusters. I
ncluding them in the training set resulted in models without any predi
ctive capability. Based on statistical and chemical criteria,they were
rejected, improving the final model substantially. The toxicological
response of phenols and some organophosphates was found to fit well in
to one model. The clustering resulted in smaller groupings than those
listed by Verhaar et al. but were only in dispute for a minority of ch
emicals. PCA allowed a quick visual inspection of the application limi
ts of the models for the HPVCs and the EINECS. The models performed we
ll for the HPVCs but could only be used to estimate a fraction of the
EINECS. PCA showed that in some cases subclusters were present. (C) 19
96 by John Wiley & Sons, Ltd.