C. Helma et al., Data quality in predictive toxicology: Identification of chemical structures and calculation of chemical properties, ENVIR H PER, 108(11), 2000, pp. 1029-1033
Every technique for toxicity prediction and for the detection of structure-
activity relationships relies on the accurate estimation and representation
of chemical and toxicologic properties. In this paper we discuss the poten
tial sources of errors associated with the identification of compounds, the
representation of their structures, and the calculation of chemical descri
ptors. It is based on a case study where machine learning techniques were a
pplied to data from noncongeneric compounds and a complex toxicologic end p
oint (carcinogenicity). We propose methods applicable to the routine qualit
y control of large chemical datasets, but our main intention is to raise aw
areness about this topic and to open a discussion about quality assurance i
n predictive toxicology. The accuracy and reproducibility of toxicity data
will be reported in another paper.