RULE INDUCTION WITH EXTENSION MATRICES

Authors
Citation
Xd. Wu, RULE INDUCTION WITH EXTENSION MATRICES, Journal of the American Society for Information Science, 49(5), 1998, pp. 435-454
Citations number
48
Categorie Soggetti
Information Science & Library Science","Computer Science Information Systems","Computer Science Information Systems
ISSN journal
00028231
Volume
49
Issue
5
Year of publication
1998
Pages
435 - 454
Database
ISI
SICI code
0002-8231(1998)49:5<435:RIWEM>2.0.ZU;2-B
Abstract
This article presents a heuristic, attribute-based, noise-tolerant dat a mining program, HCV (Version 2.0), based on the newly-developed exte nsion matrix approach. By dividing the positive examples (PE) of a spe cific class in a given example set into intersecting groups and adopti ng a set of strategies to find a heuristic conjunctive formula in each group which covers all the group's positive examples and none of the negative examples (NE), the HCV induction algorithm adopted in the HCV (Version 2.0) software finds a description formula in the form of var iable-valued logic for PE against NE in low-order polynomial time at i nduction time. In addition to the HCV induction algorithm, this articl e also outlines some of the techniques for noise handling and discreti zation of numerical domains developed and implemented in the HCV (Vers ion 2.0) software, and provides a performance comparison of HCV (Versi on 2.0) with other data mining algorithms ID3, C4.5, C4.5rules, and Ne wID in noisy and continuous domains. The empirical comparison shows th at the rules generated by HCV (Version 2.0) are more compact than the decision trees or rules produced by ID3-like algorithms, and HCV's pre dicative accuracy is competitive with ID3-like algorithms.