Discovering knowledge from noisy databases using genetic programming

Citation
Ml. Wong et al., Discovering knowledge from noisy databases using genetic programming, J AM S INFO, 51(9), 2000, pp. 870-881
Citations number
34
Categorie Soggetti
Library & Information Science
Journal title
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
ISSN journal
00028231 → ACNP
Volume
51
Issue
9
Year of publication
2000
Pages
870 - 881
Database
ISI
SICI code
0002-8231(200007)51:9<870:DKFNDU>2.0.ZU;2-J
Abstract
In data mining, we emphasize the need for learning from huge, incomplete, a nd imperfect data sets. To handle noise in the problem domain, existing lea rning systems avoid overfitting the imperfect training examples by excludin g insignificant patterns. The problem is that these systems use a limiting attribute-value language for representing the training examples and the ind uced knowledge. Moreover, some important patterns are ignored because they are statistically insignificant. In this article, we present a framework th at combines Genetic Programming and Inductive Logic Programming to induce k nowledge represented in various knowledge representation formalisms from no isy databases. The framework is based on a formalism of logic grammars, and it can specify the search space declaratively. An implementation of the fr amework, LOGENPRO (The Logic grammar based GENetic PROgramming system), has been developed. The performance of LOGENPRO is evaluated on the chess end- game domain. We compare LOGENPRO with FOIL and other learning systems in de tail, and find its performance is significantly better than that of the oth ers, This result indicates that the Darwinian principle of natural selectio n is a plausible noise handling method that can avoid overfitting and ident ify important patterns at the same time. Moreover, the system is applied to one real-life medical database. The knowledge discovered provides insights to and allows better understanding of the medical domains.