STRUCTURE-ACTIVITY-RELATIONSHIPS DERIVED BY MACHINE LEARNING - THE USE OF ATOMS AND THEIR BOND CONNECTIVITIES TO PREDICT MUTAGENICITY BY INDUCTIVE LOGIC PROGRAMMING

Citation
Rd. King et al., STRUCTURE-ACTIVITY-RELATIONSHIPS DERIVED BY MACHINE LEARNING - THE USE OF ATOMS AND THEIR BOND CONNECTIVITIES TO PREDICT MUTAGENICITY BY INDUCTIVE LOGIC PROGRAMMING, Proceedings of the National Academy of Sciences of the United Statesof America, 93(1), 1996, pp. 438-442
Citations number
26
Categorie Soggetti
Multidisciplinary Sciences
ISSN journal
00278424
Volume
93
Issue
1
Year of publication
1996
Pages
438 - 442
Database
ISI
SICI code
0027-8424(1996)93:1<438:SDBML->2.0.ZU;2-V
Abstract
We present a general approach to forming structure-activity relationsh ips (SARs). This approach is based on representing chemical structure by atoms and their bond connectivities in combination with the inducti ve logic programming (ILP) algorithm PROGOL. Existing SAR methods desc ribe chemical structure by using attributes which are general properti es of an object, It is not possible to map chemical structure directly to attribute-based descriptions, as such descriptions have no interna l organization, A more natural and general way to describe chemical st ructure is to use a relational description, where the internal constru ction of the description maps that of the object described. Our atom a nd bond connectivities representation is a relational description, ILP algorithms can form SARs with relational descriptions, We have tested the relational approach by investigating the SARs of 230 aromatic and heteroaromatic nitro compounds, These compounds had been split previo usly into two subsets, 188 compounds that were amenable to regression and 42 that were not, For the 188 compounds, a SBR was found that was as accurate as the best statistical or neural network-generated SARs, The PROGOL SAR has the advantages that it did not need the use of any indicator variables handcrafted by an expert, and the generated rules were easily comprehensible. For the 42 compounds, PROGOL formed a SAR that was significantly (P < 0.025) more accurate than linear regressio n, quadratic regression, and back-propagation. This SAR is based on an automatically generated structural alert for mutagenicity.