STRUCTURE-ACTIVITY-RELATIONSHIPS DERIVED BY MACHINE LEARNING - THE USE OF ATOMS AND THEIR BOND CONNECTIVITIES TO PREDICT MUTAGENICITY BY INDUCTIVE LOGIC PROGRAMMING
Rd. King et al., STRUCTURE-ACTIVITY-RELATIONSHIPS DERIVED BY MACHINE LEARNING - THE USE OF ATOMS AND THEIR BOND CONNECTIVITIES TO PREDICT MUTAGENICITY BY INDUCTIVE LOGIC PROGRAMMING, Proceedings of the National Academy of Sciences of the United Statesof America, 93(1), 1996, pp. 438-442
We present a general approach to forming structure-activity relationsh
ips (SARs). This approach is based on representing chemical structure
by atoms and their bond connectivities in combination with the inducti
ve logic programming (ILP) algorithm PROGOL. Existing SAR methods desc
ribe chemical structure by using attributes which are general properti
es of an object, It is not possible to map chemical structure directly
to attribute-based descriptions, as such descriptions have no interna
l organization, A more natural and general way to describe chemical st
ructure is to use a relational description, where the internal constru
ction of the description maps that of the object described. Our atom a
nd bond connectivities representation is a relational description, ILP
algorithms can form SARs with relational descriptions, We have tested
the relational approach by investigating the SARs of 230 aromatic and
heteroaromatic nitro compounds, These compounds had been split previo
usly into two subsets, 188 compounds that were amenable to regression
and 42 that were not, For the 188 compounds, a SBR was found that was
as accurate as the best statistical or neural network-generated SARs,
The PROGOL SAR has the advantages that it did not need the use of any
indicator variables handcrafted by an expert, and the generated rules
were easily comprehensible. For the 42 compounds, PROGOL formed a SAR
that was significantly (P < 0.025) more accurate than linear regressio
n, quadratic regression, and back-propagation. This SAR is based on an
automatically generated structural alert for mutagenicity.