ITA
ENG

Data mining in large databases using domain generalization graphs

Authors

Hilderman, RJ Hamilton, HJ Cercone, N

Citation

Rj. Hilderman et al., Data mining in large databases using domain generalization graphs, J INTELL IN, 13(3), 1999, pp. 195-234

Citations number

Categorie Soggetti

Information Tecnology & Communication Systems

Journal title

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS

ISSN journal

09259902 → ACNP

Volume

Issue

Year of publication

1999

Pages

195 - 234

Database

ISI

SICI code

0925-9902(199911)13:3<195:DMILDU>2.0.ZU;2-H

Abstract

Attribute-oriented generalization summarizes the information in a relationa l database by repeatedly replacing specific attribute values with more gene ral concepts according to user-defined concept hierarchies. We introduce do main generalization graphs for controlling the generalization of a set of a ttributes and show how they are constructed. We then present serial and par allel versions of the Multi-Attribute Generalization algorithm for traversi ng the generalization state space described by joining the domain generaliz ation graphs for multiple attributes. Based upon a generate-and-test approa ch, the algorithm generates all possible summaries consistent with the doma in generalization graphs. Our experimental results show that significant sp eedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summ aries using measures based upon variance and relative entropy. Our experime ntal results also show that these measures provide an effective basis for a nalyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., tho se with few attributes and/or tuples) as more interesting.