Data mining in large databases using domain generalization graphs

Citation
Rj. Hilderman et al., Data mining in large databases using domain generalization graphs, J INTELL IN, 13(3), 1999, pp. 195-234
Citations number
48
Categorie Soggetti
Information Tecnology & Communication Systems
Journal title
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
ISSN journal
09259902 → ACNP
Volume
13
Issue
3
Year of publication
1999
Pages
195 - 234
Database
ISI
SICI code
0925-9902(199911)13:3<195:DMILDU>2.0.ZU;2-H
Abstract
Attribute-oriented generalization summarizes the information in a relationa l database by repeatedly replacing specific attribute values with more gene ral concepts according to user-defined concept hierarchies. We introduce do main generalization graphs for controlling the generalization of a set of a ttributes and show how they are constructed. We then present serial and par allel versions of the Multi-Attribute Generalization algorithm for traversi ng the generalization state space described by joining the domain generaliz ation graphs for multiple attributes. Based upon a generate-and-test approa ch, the algorithm generates all possible summaries consistent with the doma in generalization graphs. Our experimental results show that significant sp eedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summ aries using measures based upon variance and relative entropy. Our experime ntal results also show that these measures provide an effective basis for a nalyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., tho se with few attributes and/or tuples) as more interesting.