Feature construction with Inductive Logic Programming: A study of quantitative predictions of biological activity aided by structural attributes

Citation
A. Srinivasan et Rd. King, Feature construction with Inductive Logic Programming: A study of quantitative predictions of biological activity aided by structural attributes, DATA M K D, 3(1), 1999, pp. 37-57
Citations number
50
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
DATA MINING AND KNOWLEDGE DISCOVERY
ISSN journal
13845810 → ACNP
Volume
3
Issue
1
Year of publication
1999
Pages
37 - 57
Database
ISI
SICI code
1384-5810(199903)3:1<37:FCWILP>2.0.ZU;2-4
Abstract
Recently, computer programs developed within the field of Inductive Logic P rogramming (ILP) have received some attention for their ability to construc t restricted first-order logic solutions using problem-specifrc background knowledge. Prominent applications of such programs have been concerned with determining "structure-activity" relationships in the areas of molecular b iology and chemistry. Typically the task here is to predict the "activity" of a compound (for example, toxicity), from its chemical structure. A summa ry of the research in the area is: (a) ILP programs have largely been restr icted to qualitative predictions of activity ("high", "low" etc.); (b) When appropriate attributes are available, ILP programs have equivalent predict ivity to standard quantitative analysis techniques like linear regression. However ILP programs usually perform better when such attributes are unavai lable; and (c) By using structural information as background knowledge, an ILP program can provide comprehensible explanations for biological activity . This paper examines the use of ILP programs as a method of "discovering" new attributes. These attributes could then be used by methods like linear regression, thus allowing for quantitative predictions while retaining the ability to use structural information as background knowledge. Using struct ure-activity tasks as a test-bed, the utility of ILP programs in constructi ng new features was evaluated by examining the prediction of biological act ivity using linear regression, with and without the aid of ILP learnt logic al attributes. In three out of the five data sets examined the addition of ILP attributes produced statistically better results. In addition six impor tant structural features that have escaped the attention of the expert chem ists were discovered. The method used here to construct new attributes is n ot specific to the problem of predicting biological activity, and the resul ts obtained suggest a wider role for ILP programs in aiding the process of scientific discovery.