Rd. King et A. Srinivasan, PREDICTION OF RODENT CARCINOGENICITY BIOASSAYS FROM MOLECULAR-STRUCTURE USING INDUCTIVE LOGIC PROGRAMMING, Environmental health perspectives, 104, 1996, pp. 1031-1040
The machine learning program Progol was applied to the problem of form
ing the structure-activity relationship (SAR) for a set of compounds t
ested for carcinogenicity in rodent bioassays by the U.S. National Tox
icology Program (NTP). Progol is the first inductive logic programming
(ILP) algorithm to use a fully relational method for describing chemi
cal structure in SARs, based on using atoms and their bond connectivit
ies. Progol is well suited to forming SARs for carcinogenicity as it i
s designed to produce easily understandable rules (structural alerts)
for sets of noncongeneric compounds. The Progol SAR method was tested
by prediction of a set of compounds that have been widely predicted by
other SAR methods (the compounds used in the NTP's first round of car
cinogenesis predictions). For these compounds no method (human or mach
ine) was significantly more accurate than Progol. Progol was the most
accurate method that did not use data from biological tests on rodents
(however, the difference in accuracy is not significant). The Progol
predictions were based solely on chemical structure and the results of
tests for Salmonella mutagenicity. Using the full NTP database, the p
rediction accuracy of Progol was estimated to be 63% (+/-3%) using 5-f
old cross validation. A set of structural alerts for carcinogenesis wa
s automatically generated and the chemical rationale for them investig
ated-these structural alerts are statistically independent of the Salm
onella mutagenicity. Carcinogenicity is predicted for the compounds us
ed in the NTP's second round of carcinogenesis predictions. The result
s for prediction of carcinogenesis, taken together with the previous s
uccessful applications of predicting mutagenicity in nitroaromatic com
pounds, and inhibition of angiogenesis by suramin analogues, show that
Progol has a role to play in understanding the SARs of cancer-related
compounds.