Xh. Hu et N. Cercone, Discovering maximal generalized decision rules through horizontal and vertical data reduction, COMPUT INTE, 17(4), 2001, pp. 685-702
We present a method to learn maximal generalized decision rules from databa
ses by integrating discretization, generalization and rough set feature sel
ection. Our method reduces the data horizontally and vertically, In the fir
st phase, discretization and generalization are integrated and the numeric
attributes are discretized into a few intervals. The primitive values of sy
mbolic attributes are replaced by high level concepts and some obvious supe
rfluous or irrelevant symbolic attributes are also eliminated. Horizontal r
eduction is accomplished by merging identical tuples after the substitution
of an attribute value by its higher level value in a pre-defined concept h
ierarchy for symbolic attributes, or the discretization of continuous (or n
umeric) attributes. This phase greatly decreases the number of tuples in th
e database. In the second phase, a novel context-sensitive feature merit me
asure is used to rank the features, a subset of relevant attributes is chos
en based on rough set theory and the merit values of the features. A reduce
d table is obtained by removing those attributes which are not in the relev
ant attributes subset and the data set is further reduced vertically withou
t destroying the interdependence relationships between classes and the attr
ibutes. Then rough set-based value reduction is further performed on the re
duced table and all redundant condition values are dropped. Finally, tuples
in the reduced table are transformed into a set of maximal generalized dec
ision rules. The experimental results on UCI data sets and a real market da
tabase demonstrate that our method can dramatically reduce the feature spac
e and improve learning accuracy.