Discovering maximal generalized decision rules through horizontal and vertical data reduction

Authors
Citation
Xh. Hu et N. Cercone, Discovering maximal generalized decision rules through horizontal and vertical data reduction, COMPUT INTE, 17(4), 2001, pp. 685-702
Citations number
32
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
COMPUTATIONAL INTELLIGENCE
ISSN journal
08247935 → ACNP
Volume
17
Issue
4
Year of publication
2001
Pages
685 - 702
Database
ISI
SICI code
0824-7935(200111)17:4<685:DMGDRT>2.0.ZU;2-A
Abstract
We present a method to learn maximal generalized decision rules from databa ses by integrating discretization, generalization and rough set feature sel ection. Our method reduces the data horizontally and vertically, In the fir st phase, discretization and generalization are integrated and the numeric attributes are discretized into a few intervals. The primitive values of sy mbolic attributes are replaced by high level concepts and some obvious supe rfluous or irrelevant symbolic attributes are also eliminated. Horizontal r eduction is accomplished by merging identical tuples after the substitution of an attribute value by its higher level value in a pre-defined concept h ierarchy for symbolic attributes, or the discretization of continuous (or n umeric) attributes. This phase greatly decreases the number of tuples in th e database. In the second phase, a novel context-sensitive feature merit me asure is used to rank the features, a subset of relevant attributes is chos en based on rough set theory and the merit values of the features. A reduce d table is obtained by removing those attributes which are not in the relev ant attributes subset and the data set is further reduced vertically withou t destroying the interdependence relationships between classes and the attr ibutes. Then rough set-based value reduction is further performed on the re duced table and all redundant condition values are dropped. Finally, tuples in the reduced table are transformed into a set of maximal generalized dec ision rules. The experimental results on UCI data sets and a real market da tabase demonstrate that our method can dramatically reduce the feature spac e and improve learning accuracy.