A. Srinivasan et Rd. King, Feature construction with Inductive Logic Programming: A study of quantitative predictions of biological activity aided by structural attributes, DATA M K D, 3(1), 1999, pp. 37-57
Recently, computer programs developed within the field of Inductive Logic P
rogramming (ILP) have received some attention for their ability to construc
t restricted first-order logic solutions using problem-specifrc background
knowledge. Prominent applications of such programs have been concerned with
determining "structure-activity" relationships in the areas of molecular b
iology and chemistry. Typically the task here is to predict the "activity"
of a compound (for example, toxicity), from its chemical structure. A summa
ry of the research in the area is: (a) ILP programs have largely been restr
icted to qualitative predictions of activity ("high", "low" etc.); (b) When
appropriate attributes are available, ILP programs have equivalent predict
ivity to standard quantitative analysis techniques like linear regression.
However ILP programs usually perform better when such attributes are unavai
lable; and (c) By using structural information as background knowledge, an
ILP program can provide comprehensible explanations for biological activity
. This paper examines the use of ILP programs as a method of "discovering"
new attributes. These attributes could then be used by methods like linear
regression, thus allowing for quantitative predictions while retaining the
ability to use structural information as background knowledge. Using struct
ure-activity tasks as a test-bed, the utility of ILP programs in constructi
ng new features was evaluated by examining the prediction of biological act
ivity using linear regression, with and without the aid of ILP learnt logic
al attributes. In three out of the five data sets examined the addition of
ILP attributes produced statistically better results. In addition six impor
tant structural features that have escaped the attention of the expert chem
ists were discovered. The method used here to construct new attributes is n
ot specific to the problem of predicting biological activity, and the resul
ts obtained suggest a wider role for ILP programs in aiding the process of
scientific discovery.