Theory of dependence values

Authors
Citation
R. Meo, Theory of dependence values, ACM T DATAB, 25(3), 2000, pp. 380-406
Citations number
20
Categorie Soggetti
Computer Science & Engineering
Journal title
ACM TRANSACTIONS ON DATABASE SYSTEMS
ISSN journal
03625915 → ACNP
Volume
25
Issue
3
Year of publication
2000
Pages
380 - 406
Database
ISI
SICI code
0362-5915(200009)25:3<380:TODV>2.0.ZU;2-V
Abstract
A new model to evaluate dependencies in data mining problems is presented a nd discussed. The well-known concept of the association rule is replaced by the new definition of dependence value, which is a single real number uniq uely associated with a given itemset. Knowledge of dependence values is suf ficient to describe all the dependencies characterizing a given data mining problem. The dependence value of an itemset is the difference between the occurrence probability of the itemset and a corresponding "maximum independ ence estimate." This can be determined as a function of joint probabilities of the subsets of the itemset being considered by maximizing a suitable en tropy function. So it is possible to separate in an itemset of cardinality k the dependence inherited from its subsets of cardinality (k - 1) and the specific inherent dependence of that itemset. The absolute value of the dif ference between the probability p(i) of the event i that indicates the pres ence of the itemset {a,b,...} and its maximum independence estimate is cons tant for any combination of values of(a, b,... ). In addition, the Boolean function specifying the combinations of values for which the dependence is positive is a parity function. So the determination of such combinations is immediate. The model appears to be simple and powerful.